SlideShare a Scribd company logo
1 of 38
Download to read offline
Overview of Tree Algorithms
from Decision Tree to xgboost
Takami Sato
8/10/2017Overview of Tree Algorithms 1
Agenda
• Xgboost occupied Kaggle
• Decision Tree
• Random Forest
• Gradient Boosting Tree
• Extreme Gradient Boosting(xgboost)
– Dart
8/10/2017Overview of Tree Algorithms 2
Xgboost occupied Kaggle
8/10/2017Overview of Tree Algorithms 3
More than half of the winning
solutions in machine learning
challenges hosted at Kaggle
adopt XGBoost
http://www.kdnuggets.com/2016/03/xgboost-implementing-winningest-kaggle-algorithm-spark-flink.html
Awesome XGBoost
• Vlad Sandulescu, Mihai Chiru, 1st place of the KDD Cup 2016 competition. Link to the arxiv paper.
• Marios Michailidis, Mathias Müller and HJ van Veen, 1st place of the Dato Truely Native? competition.
Link to the Kaggle interview.
• Vlad Mironov, Alexander Guschin, 1st place of the CERN LHCb experiment Flavour of Physics
competition. Link to the Kaggle interview.
• Josef Slavicek, 3rd place of the CERN LHCb experiment Flavour of Physics competition. Link to the
Kaggle interview.
• Mario Filho, Josef Feigl, Lucas, Gilberto, 1st place of the Caterpillar Tube Pricing competition. Link to the
Kaggle interview.
• Qingchen Wang, 1st place of the Liberty Mutual Property Inspection. Link to the Kaggle interview.
• Chenglong Chen, 1st place of the Crowdflower Search Results Relevance. Link to the winning solution.
• Alexandre Barachant (“Cat”) and Rafał Cycoń (“Dog”), 1st place of the Grasp-and-Lift EEG Detection.
Link to the Kaggle interview.
• Halla Yang, 2nd place of the Recruit Coupon Purchase Prediction Challenge. Link to the Kaggle interview.
• Owen Zhang, 1st place of the Avito Context Ad Clicks competition. Link to the Kaggle interview.
• Keiichi Kuroyanagi, 2nd place of the Airbnb New User Bookings. Link to the Kaggle interview.
• Marios Michailidis, Mathias Müller and Ning Situ, 1st place Homesite Quote Conversion. Link to the
Kaggle interview.
8/10/2017Overview of Tree Algorithms 4
Awesome XGBoost: Machine Learning Challenge Winning Solutions
https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions
What’s happened?
XGBoost is a for Gradient boosting trees model
8/10/2017Overview of Tree Algorithms 5
Decision Tree Random Forest Gradient Boosting Tree
?xgboost
What’s happened during this evolution?
Decision Trees was the beginning of everything.
Decision Trees (DTs) are a non-parametric supervised learning
method used for classification and regression. The goal is to
create a model that predicts the value of a target variable by
learning simple decision rules inferred from the data features.
cited by http://scikit-learn.org/stable/modules/tree.html
8/10/2017Overview of Tree Algorithms 6
Definition.
Decision Tree
A
E
C
D
B
decision
rule 1
decision
rule 2
decision
rule 3
decision
rule 4
How were the rules found?
8/10/2017Overview of Tree Algorithms 7
Regression
Set a metric that evaluates imputicity of a split of data. then minimize the
metric on each node.
Classification
Gini impurity(CART)
Entropy
(C4.5)
Variance
𝑝 𝑘: probability of an item with label 𝑘
𝐾 : number of class
𝑆𝐷(𝑆): standard varience of set S
𝑆 𝐿, 𝑆 𝑅 : left and right split of a node
Examples
8/10/2017Overview of Tree Algorithms 8
Classification
sex age survived
female 29 1
male 1 1
female 2 0
male 30 0
female 25 0
male 48 1
female 63 1
male 39 0
female 53 1
male 71 0
Predict a person survived or not from Titanic Dataset.
age #survived #people probability Gini impurity
age > = 40 3 4 0.75 0.375
age <40 2 6 0.33 0.444
sex #survived #people probability Gini impurity
male 2 5 0.40 0.480
female 3 5 0.60 0.480
0.42
decide thresholds and
calculate probabilities
weighted average
Gini impurity
0.48
Gini impurity: 0.5
0.08 Down
0.03 Down
Examples
8/10/2017Overview of Tree Algorithms 9
Classification
sex age survived
female 29 1
male 1 1
female 2 0
male 30 0
female 25 0
male 48 1
female 63 1
male 39 0
female 53 1
male 71 0
Predict a person survived or not from Titanic Dataset.
age #survived #people probability Entropy
age > = 40 3 4 0.75 -0.375
age <40 2 6 0.33 -0.444
sex #survived #people probability Entropy
male 2 5 0.40 0.480
female 3 5 0.60 0.480
0.61
decide thresholds and
calculate probabilities
0.67
Entropy: 0.69
weighted average
Entropy
weighted average
Entropy
0.08 Down
0.02 Down
Examples
8/10/2017Overview of Tree Algorithms 10
Regression
sex survived age
female 1 29
male 1 1
female 0 2
male 0 30
female 0 25
male 1 48
female 1 63
male 0 39
female 1 53
male 0 71
Predict age of a person from Titanic Dataset.
491.0
calculate variances
weighted average
Variance
sex Var #people
male 524.56 5
female 466.24 5
survived Var #people
0 502.64 5
1 479.36 5
495.4
Varience: 498.29
7.29 Down
2.11 Down
Other techniques for decision tree
8/10/2017Overview of Tree Algorithms 11
Stopping Criteria
Finding a good threshold for numerical data
Pruning tree
• Maximum depth
• Minimum leaf nodes
• observed point of data
• the point that class labels are changed
• percentile of data
𝑇: a subtree of a original tree
𝜏: index of leaf nodes
Impurity metric
(gini, entropy or varience)
• Pruning tree when a subtree’s metric is above a threshold.
cited by PRML formula (14.31)
Random Forest
8/10/2017Overview of Tree Algorithms 12
Random Forest
8/10/2017Overview of Tree Algorithms 13
https://stat.ethz.ch/education/semesters/ss2012/ams/slides/v10.2.pdf
Main ideas of Random Forest
• Bootstrapping data
• Random selection of features
• Ensembling trees
– Average
– Majority voting
8/10/2017Overview of Tree Algorithms 14
Random Forest as a Feature Selector
Random Forest is difficult interpreted, but calculate some kind of feature
importances.
8/10/2017Overview of Tree Algorithms 15
Gain-based importance
Summing up gains on each split. (finally, normarizing all importances )
Above split, “Age” got 0.08 feature importance point.
Random Forest as a Feature Selector
8/10/2017Overview of Tree Algorithms 16
Permutation-based importance
Decreasing accuracy after permuting each column
Target Feat. 1 Feat. 2 Feat. 3 Feat. 4
0 1 2 11 101
1 2 3 12 102
1 3 5 13 103
0 4 7 14 104
Original data
Target Feat. 1 Feat. 2 Feat. 3 Feat. 4
0 1 5 11 101
1 2 7 12 102
1 3 2 13 103
0 4 3 14 104
Permuted data
Accuracy: 0.8 Accuracy: 0.7
0.1 Down
Feature 2’s importance is 0.1.
Which importance is good ?
8/10/2017Overview of Tree Algorithms 17
Pros. Cons.
Gain-based
importance
• No need additional computing
• Implemented in scikit-learn
• biased in favor of
continuous variables and
variables with many
categories [Strobl+ 2008]
Permutation-based
importance • Good for correlated variables? • Need additional computing
It is still a controversial issue.
If you want to learn more, please check [Louppe+ 2013]
Out-of-bag (OOB) Error
In random forests, we can get an unbiased estimator of the test error without CV.
8/10/2017Overview of Tree Algorithms 18
Procedure to get OOB Error
kth tree
bootstraping
Remains data
(OOB data)
All data
Calucurate an error for
the OOB data
Averaging the OOB errors
by each data
Loop for constructing trees
Scikit-learn options
8/10/2017Overview of Tree Algorithms 19
Parameter Description
n_estimators number of tree
criterion "gini" or "entropy"
max_features
The number of features to consider when looking for the best
split
max_depth The maximum depth of the tree
min_samples_split
The minimum number of samples required to split an internal
node
min_samples_leaf The minimum number of samples required to be at a leaf node
min_weight_fraction_leaf
The minimum weighted fraction of the sum total of weights (of
all the input samples) required to be at a leaf node.
max_leaf_nodes Grow trees with max_leaf_nodes in best-first fashion.
min_impurity_split Threshold for early stopping in tree growth.
bootstrap Whether bootstrap samples are used when building trees.
oob_score
Whether to use out-of-bag samples to estimate the
generalization accuracy.
warm_start
When set to True, reuse the solution of the previous call to fit
and add more estimators to the ensemble, otherwise, just fit a
whole new forest.
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
Gradient Boosting Tree (GBT)
8/10/2017Overview of Tree Algorithms 20
Gradient Boosting Tree (GBT)
The Elements of Statistical Learning 2nd edition, p. 359
8/10/2017Overview of Tree Algorithms 21
psedo-residual
1-demantional
optimization for
each leaf.
Xgboost(eXtreme Gradient Boosting)
• xgboost is one of the implementation of GBT.
• Splitting criterion is different from the criterions I showed above.
8/10/2017Overview of Tree Algorithms 22
Loss function
number of leaves
xgboost also
implemented l1
regularization.
(we see later.)
Splitting criterion directly derived from loss function is
the biggest contribution of xgboost.
Xgboost’s Split finding algorithms
• xgboost is one of the implementation of GBT.
• Splitting criterion is different from the criterions I showed above.
8/10/2017Overview of Tree Algorithms 23
Quadratic Approximation First order gradient:
Second order gradient:
Xgboost’s Split finding algorithms
• xgboost is one of the implementation of GBT.
• Splitting criterion is different from the criterions I showed above.
8/10/2017Overview of Tree Algorithms 24
Solve the minimal point by isolating w
Gain of this criterion when a node splits to 𝐿 𝐿 and 𝐿 𝑅
This is the xgboost’s splitting criterion.
Xgboost’s Split finding algorithms
8/10/2017Overview of Tree Algorithms 25
Xgboost’s Split finding algorithms for sparse data
8/10/2017Overview of Tree Algorithms 26
Parameters of xgboost
8/10/2017Overview of Tree Algorithms 27
Parameters of xgboost
• eta [default=0.3, range: [0,1]]
– step size shrinkage used in update to prevents overfitting. After each boosting
step, we can directly get the weights of new features. and eta actually shrinks the
feature weights to make the boosting process more conservative.
8/10/2017Overview of Tree Algorithms 28
https://github.com/dmlc/xgboost/blob/master/doc/parameter.md
Updating of shrinkage
𝜂
• gamma [default=0, range: [0,∞]]
– minimum loss reduction required to make a further partition on a leaf node of the
tree. the larger, the more conservative the algorithm will be.
If gamma is big enough, this term will be minus. (it does not cause a split)
Parameters of xgboost
8/10/2017Overview of Tree Algorithms 29
• max_depth [default=6, range: [1,∞]]
– maximum depth of a tree, increase this value will make model more complex /
likely to be overfitting.
• min_child_weight [default=1, range: [0,∞]]
– minimum sum of instance weight(hessian) needed in a child. If the tree partition step
results in a leaf node with the sum of instance weight less than min_child_weight, then
the building process will give up further partitioning. In linear regression mode, this
simply corresponds to minimum number of instances needed to be in each node. The
larger, the more conservative the algorithm will be.
sum of instance hessian in leaf j
< min_child_weightIf
, then stop partitioning.
Parameters of xgboost
• max_delta_step [default=0, range: [0,∞]]
– Maximum delta step we allow each tree's weight estimation to be. If the
value is set to 0, it means there is no constraint. If it is set to a positive
value, it can help making the update step more conservative. Usually
this parameter is not needed, but it might help in logistic regression
when class is extremely imbalanced. Set it to value of 1-10 might help
control the update
8/10/2017Overview of Tree Algorithms 30
If > max_delta_step
, then max_delta_step ?
I am not sure, please someone tells me.
Parameters of xgboost
8/10/2017Overview of Tree Algorithms 31
• subsample [default=1, range: (0,1]]
– subsample ratio of the training instance. Setting it to 0.5 means that XGBoost
randomly collected half of the data instances to grow trees and this will prevent
overfitting.
• colsample_bylevel [default=1, range: (0,1]]
– subsample ratio of columns for each split, in each level.
• colsample_bytree [default=1, range: (0,1]]
– subsample ratio of columns when constructing each tree.
Parameters of xgboost
8/10/2017Overview of Tree Algorithms 32
• lambda [default=1]
– L2 regularization term on weights, increase this value will make model more conservative.
• alpha [default=1]
– L1 regularization term on weights, increase this value will make model more conservative.
https://www.kaggle.com/forums/f/15/kaggle-forum/t/24181/xgboost-alpha-parameter/138272
https://github.com/dmlc/xgboost/blob/v0.60/src/tree/param.h#L178
Parameters of xgboost
Please see Algorithm 1 and Algorithm 2.
8/10/2017Overview of Tree Algorithms 33
• tree_method [default='auto']
– The tree construction algorithm used in XGBoost(see description in the reference
paper)
– Distributed and external memory version only support approximate algorithm.
– Choices: {'auto', 'exact', 'approx'}
– 'auto': Use heuristic to choose faster one.
• For small to medium dataset, exact greedy will be used.
• For very large-dataset, approximate algorithm will be chosen.
• Because old behavior is always use exact greedy in single machine, user will get a message when approximate
algorithm is chosen to notify this choice.
– 'exact': Exact greedy algorithm.
– 'approx': Approximate greedy algorithm using sketching and histogram.
• sketch_eps [default=0.03, range: (0, 1)]
– This is only used for approximate greedy algorithm.
– This roughly translated into O(1 / sketch_eps) number of bins. Compared to
directly select number of bins, this comes with theoretical guarantee with sketch
accuracy.
– Usually user does not have to tune this. but consider setting to a lower number
for more accurate enumeration.
I am not sure the parameter, but the main developer also said
Parameters for early stopping
8/10/2017Overview of Tree Algorithms 34
• updater_seq, [default="grow_colmaker,prune"]
– A comma separated string mentioning The sequence of Tree updaters that
should be run. A tree updater is a pluggable operation performed on the tree at
every step using the gradient information. Tree updaters can be registered using
the plugin system provided.
https://github.com/dmlc/xgboost/issues/1732
• num_round
– The number of rounds for boosting
It counterparts of “n_estimator” in scikit-learn API.
Parameters for early stopping
8/10/2017Overview of Tree Algorithms 35
• early_stopping_rounds
– Activates early stopping. Validation error needs to decrease at least every <early_stopping_rounds>
round(s) to continue training. Requires at least one item in evals. If there’s more than one, will use the last.
Returns the model from the last iteration (not the best one). If early stopping occurs, the model will have
three additional fields: bst.best_score, bst.best_iteration and bst.best_ntree_limit. (Use bst.best_ntree_limit
to get the correct value if num_parallel_tree and/or num_class appears in the parameters)
• feval
– Customized evaluation function
def sample_feval(preds, dtrain):
labels = dtrain.get_label()
some_metric = calc_sume_metric(preds, labels)
return 'MCC', some_metric
sample feval
If you have a validation set, you can tune boosting round.
https://github.com/dmlc/xgboost/blob/master/demo/guide-python/custom_objective.py
DART [2015 Rashmi+]
• Employing dropouts technique to GBT (MART)
• DART prevents over-specialization.
– Trees added at early have too much contribution to predict
– Shrinkage also prevents over-specialization,
but the authors claim not enough.
8/10/2017Overview of Tree Algorithms 36
DART(Dropouts meet Multiple Additive Regression Trees)
DART [2015 Rashmi+]
8/10/2017Overview of Tree Algorithms 37
Deciding which
trees are dropped
Calcurating
psedo-residual
Reducing the
weights of dropped
trees
Parameters for DART at xgboost
8/10/2017Overview of Tree Algorithms 38
• normalize_type [default="tree"]
– type of normalization algorithm.
– "tree": new trees have the same weight of each of dropped trees.
• weight of new trees are 1 / (k + learning_rate)
• dropped trees are scaled by a factor of k / (k + learning_rate)
– "forest": new trees have the same weight of sum of dropped trees (forest).
• weight of new trees are 1 / (1 + learning_rate)
• dropped trees are scaled by a factor of 1 / (1 + learning_rate)
• sample_type [default="uniform"]
– type of sampling algorithm.
– "uniform": dropped trees are selected uniformly.
– "weighted": dropped trees are selected in proportion to weight.
• rate_drop [default=0.0, range: [0.0, 1.0]]
– dropout rate.
• skip_drop [default=0.0, range: [0.0, 1.0]]
– probability of skip dropout.
• If a dropout is skipped, new trees are added in the same manner as gbtree.

More Related Content

What's hot

整数計画法に基づく説明可能性な機械学習へのアプローチ
整数計画法に基づく説明可能性な機械学習へのアプローチ整数計画法に基づく説明可能性な機械学習へのアプローチ
整数計画法に基づく説明可能性な機械学習へのアプローチ
Kentaro Kanamori
 
LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~
LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~
LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~
RyuichiKanoh
 

What's hot (20)

【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
 
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
 
ベイズ推論とシミュレーション法の基礎
ベイズ推論とシミュレーション法の基礎ベイズ推論とシミュレーション法の基礎
ベイズ推論とシミュレーション法の基礎
 
Kaggleのテクニック
KaggleのテクニックKaggleのテクニック
Kaggleのテクニック
 
[DL輪読会]Understanding Black-box Predictions via Influence Functions
[DL輪読会]Understanding Black-box Predictions via Influence Functions [DL輪読会]Understanding Black-box Predictions via Influence Functions
[DL輪読会]Understanding Black-box Predictions via Influence Functions
 
Gradient Tree Boosting はいいぞ
Gradient Tree Boosting はいいぞGradient Tree Boosting はいいぞ
Gradient Tree Boosting はいいぞ
 
【論文紹介】How Powerful are Graph Neural Networks?
【論文紹介】How Powerful are Graph Neural Networks?【論文紹介】How Powerful are Graph Neural Networks?
【論文紹介】How Powerful are Graph Neural Networks?
 
整数計画法に基づく説明可能性な機械学習へのアプローチ
整数計画法に基づく説明可能性な機械学習へのアプローチ整数計画法に基づく説明可能性な機械学習へのアプローチ
整数計画法に基づく説明可能性な機械学習へのアプローチ
 
LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~
LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~
LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~
 
実践多クラス分類 Kaggle Ottoから学んだこと
実践多クラス分類 Kaggle Ottoから学んだこと実践多クラス分類 Kaggle Ottoから学んだこと
実践多クラス分類 Kaggle Ottoから学んだこと
 
Triplet Loss 徹底解説
Triplet Loss 徹底解説Triplet Loss 徹底解説
Triplet Loss 徹底解説
 
探索と活用の戦略 ベイズ最適化と多腕バンディット
探索と活用の戦略 ベイズ最適化と多腕バンディット探索と活用の戦略 ベイズ最適化と多腕バンディット
探索と活用の戦略 ベイズ最適化と多腕バンディット
 
A3C解説
A3C解説A3C解説
A3C解説
 
分散深層学習 @ NIPS'17
分散深層学習 @ NIPS'17分散深層学習 @ NIPS'17
分散深層学習 @ NIPS'17
 
グラフデータ分析 入門編
グラフデータ分析 入門編グラフデータ分析 入門編
グラフデータ分析 入門編
 
SSII2021 [TS2] 深層強化学習 〜 強化学習の基礎から応用まで 〜
SSII2021 [TS2] 深層強化学習 〜 強化学習の基礎から応用まで 〜SSII2021 [TS2] 深層強化学習 〜 強化学習の基礎から応用まで 〜
SSII2021 [TS2] 深層強化学習 〜 強化学習の基礎から応用まで 〜
 
LightGBM: a highly efficient gradient boosting decision tree
LightGBM: a highly efficient gradient boosting decision treeLightGBM: a highly efficient gradient boosting decision tree
LightGBM: a highly efficient gradient boosting decision tree
 
[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling
[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling
[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling
 
PyTorch, PixyzによるGenerative Query Networkの実装
PyTorch, PixyzによるGenerative Query Networkの実装PyTorch, PixyzによるGenerative Query Networkの実装
PyTorch, PixyzによるGenerative Query Networkの実装
 
【基調講演】『深層学習の原理の理解に向けた理論の試み』 今泉 允聡(東大)
【基調講演】『深層学習の原理の理解に向けた理論の試み』 今泉 允聡(東大)【基調講演】『深層学習の原理の理解に向けた理論の試み』 今泉 允聡(東大)
【基調講演】『深層学習の原理の理解に向けた理論の試み』 今泉 允聡(東大)
 

Similar to Overview of tree algorithms from decision tree to xgboost

Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
AdityaSoraut
 
Module III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptxModule III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptx
Shivakrishnan18
 
Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...
Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...
Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...
Weiyang Tong
 
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
ssuser33da69
 
Best practices machine learning final
Best practices machine learning finalBest practices machine learning final
Best practices machine learning final
Dianna Doan
 
Ga presentation
Ga presentationGa presentation
Ga presentation
ziad zohdy
 
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
 ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO... ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
cscpconf
 

Similar to Overview of tree algorithms from decision tree to xgboost (20)

A Modern Introduction to Decision Tree Ensembles
A Modern Introduction to Decision Tree EnsemblesA Modern Introduction to Decision Tree Ensembles
A Modern Introduction to Decision Tree Ensembles
 
Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for Regression
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
Module III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptxModule III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptx
 
forest-cover-type
forest-cover-typeforest-cover-type
forest-cover-type
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
 
Propagating Data Policies - A User Study
Propagating Data Policies - A User StudyPropagating Data Policies - A User Study
Propagating Data Policies - A User Study
 
[Women in Data Science Meetup ATX] Decision Trees
[Women in Data Science Meetup ATX] Decision Trees [Women in Data Science Meetup ATX] Decision Trees
[Women in Data Science Meetup ATX] Decision Trees
 
Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...
Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...
Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...
 
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
 
Best practices machine learning final
Best practices machine learning finalBest practices machine learning final
Best practices machine learning final
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKit
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Ga presentation
Ga presentationGa presentation
Ga presentation
 
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
 ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO... ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
 
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.pptArtificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETSURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
 

More from Takami Sato

More from Takami Sato (11)

Kaggle Santa 2019で学ぶMIP最適化入門
Kaggle Santa 2019で学ぶMIP最適化入門Kaggle Santa 2019で学ぶMIP最適化入門
Kaggle Santa 2019で学ぶMIP最適化入門
 
Quoraコンペ参加記録
Quoraコンペ参加記録Quoraコンペ参加記録
Quoraコンペ参加記録
 
Data Science Bowl 2017 Winning Solutions Survey
Data Science Bowl 2017Winning Solutions SurveyData Science Bowl 2017Winning Solutions Survey
Data Science Bowl 2017 Winning Solutions Survey
 
NIPS2016論文紹介 Riemannian SVRG fast stochastic optimization on riemannian manif...
NIPS2016論文紹介 Riemannian SVRG fast stochastic optimization on riemannian manif...NIPS2016論文紹介 Riemannian SVRG fast stochastic optimization on riemannian manif...
NIPS2016論文紹介 Riemannian SVRG fast stochastic optimization on riemannian manif...
 
Icml2015 論文紹介 sparse_subspace_clustering_with_missing_entries
Icml2015 論文紹介 sparse_subspace_clustering_with_missing_entriesIcml2015 論文紹介 sparse_subspace_clustering_with_missing_entries
Icml2015 論文紹介 sparse_subspace_clustering_with_missing_entries
 
AAをつくろう!
AAをつくろう!AAをつくろう!
AAをつくろう!
 
High performance python computing for data science
High performance python computing for data scienceHigh performance python computing for data science
High performance python computing for data science
 
Scikit learnで学ぶ機械学習入門
Scikit learnで学ぶ機械学習入門Scikit learnで学ぶ機械学習入門
Scikit learnで学ぶ機械学習入門
 
最適化超入門
最適化超入門最適化超入門
最適化超入門
 
Word2vecで大谷翔平の二刀流論争に終止符を打つ!
Word2vecで大谷翔平の二刀流論争に終止符を打つ!Word2vecで大谷翔平の二刀流論争に終止符を打つ!
Word2vecで大谷翔平の二刀流論争に終止符を打つ!
 
セクシー女優で学ぶ画像分類入門
セクシー女優で学ぶ画像分類入門セクシー女優で学ぶ画像分類入門
セクシー女優で学ぶ画像分類入門
 

Recently uploaded

(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Silpa
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Silpa
 

Recently uploaded (20)

(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 

Overview of tree algorithms from decision tree to xgboost

  • 1. Overview of Tree Algorithms from Decision Tree to xgboost Takami Sato 8/10/2017Overview of Tree Algorithms 1
  • 2. Agenda • Xgboost occupied Kaggle • Decision Tree • Random Forest • Gradient Boosting Tree • Extreme Gradient Boosting(xgboost) – Dart 8/10/2017Overview of Tree Algorithms 2
  • 3. Xgboost occupied Kaggle 8/10/2017Overview of Tree Algorithms 3 More than half of the winning solutions in machine learning challenges hosted at Kaggle adopt XGBoost http://www.kdnuggets.com/2016/03/xgboost-implementing-winningest-kaggle-algorithm-spark-flink.html
  • 4. Awesome XGBoost • Vlad Sandulescu, Mihai Chiru, 1st place of the KDD Cup 2016 competition. Link to the arxiv paper. • Marios Michailidis, Mathias Müller and HJ van Veen, 1st place of the Dato Truely Native? competition. Link to the Kaggle interview. • Vlad Mironov, Alexander Guschin, 1st place of the CERN LHCb experiment Flavour of Physics competition. Link to the Kaggle interview. • Josef Slavicek, 3rd place of the CERN LHCb experiment Flavour of Physics competition. Link to the Kaggle interview. • Mario Filho, Josef Feigl, Lucas, Gilberto, 1st place of the Caterpillar Tube Pricing competition. Link to the Kaggle interview. • Qingchen Wang, 1st place of the Liberty Mutual Property Inspection. Link to the Kaggle interview. • Chenglong Chen, 1st place of the Crowdflower Search Results Relevance. Link to the winning solution. • Alexandre Barachant (“Cat”) and Rafał Cycoń (“Dog”), 1st place of the Grasp-and-Lift EEG Detection. Link to the Kaggle interview. • Halla Yang, 2nd place of the Recruit Coupon Purchase Prediction Challenge. Link to the Kaggle interview. • Owen Zhang, 1st place of the Avito Context Ad Clicks competition. Link to the Kaggle interview. • Keiichi Kuroyanagi, 2nd place of the Airbnb New User Bookings. Link to the Kaggle interview. • Marios Michailidis, Mathias Müller and Ning Situ, 1st place Homesite Quote Conversion. Link to the Kaggle interview. 8/10/2017Overview of Tree Algorithms 4 Awesome XGBoost: Machine Learning Challenge Winning Solutions https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions
  • 5. What’s happened? XGBoost is a for Gradient boosting trees model 8/10/2017Overview of Tree Algorithms 5 Decision Tree Random Forest Gradient Boosting Tree ?xgboost What’s happened during this evolution?
  • 6. Decision Trees was the beginning of everything. Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. cited by http://scikit-learn.org/stable/modules/tree.html 8/10/2017Overview of Tree Algorithms 6 Definition. Decision Tree A E C D B decision rule 1 decision rule 2 decision rule 3 decision rule 4
  • 7. How were the rules found? 8/10/2017Overview of Tree Algorithms 7 Regression Set a metric that evaluates imputicity of a split of data. then minimize the metric on each node. Classification Gini impurity(CART) Entropy (C4.5) Variance 𝑝 𝑘: probability of an item with label 𝑘 𝐾 : number of class 𝑆𝐷(𝑆): standard varience of set S 𝑆 𝐿, 𝑆 𝑅 : left and right split of a node
  • 8. Examples 8/10/2017Overview of Tree Algorithms 8 Classification sex age survived female 29 1 male 1 1 female 2 0 male 30 0 female 25 0 male 48 1 female 63 1 male 39 0 female 53 1 male 71 0 Predict a person survived or not from Titanic Dataset. age #survived #people probability Gini impurity age > = 40 3 4 0.75 0.375 age <40 2 6 0.33 0.444 sex #survived #people probability Gini impurity male 2 5 0.40 0.480 female 3 5 0.60 0.480 0.42 decide thresholds and calculate probabilities weighted average Gini impurity 0.48 Gini impurity: 0.5 0.08 Down 0.03 Down
  • 9. Examples 8/10/2017Overview of Tree Algorithms 9 Classification sex age survived female 29 1 male 1 1 female 2 0 male 30 0 female 25 0 male 48 1 female 63 1 male 39 0 female 53 1 male 71 0 Predict a person survived or not from Titanic Dataset. age #survived #people probability Entropy age > = 40 3 4 0.75 -0.375 age <40 2 6 0.33 -0.444 sex #survived #people probability Entropy male 2 5 0.40 0.480 female 3 5 0.60 0.480 0.61 decide thresholds and calculate probabilities 0.67 Entropy: 0.69 weighted average Entropy weighted average Entropy 0.08 Down 0.02 Down
  • 10. Examples 8/10/2017Overview of Tree Algorithms 10 Regression sex survived age female 1 29 male 1 1 female 0 2 male 0 30 female 0 25 male 1 48 female 1 63 male 0 39 female 1 53 male 0 71 Predict age of a person from Titanic Dataset. 491.0 calculate variances weighted average Variance sex Var #people male 524.56 5 female 466.24 5 survived Var #people 0 502.64 5 1 479.36 5 495.4 Varience: 498.29 7.29 Down 2.11 Down
  • 11. Other techniques for decision tree 8/10/2017Overview of Tree Algorithms 11 Stopping Criteria Finding a good threshold for numerical data Pruning tree • Maximum depth • Minimum leaf nodes • observed point of data • the point that class labels are changed • percentile of data 𝑇: a subtree of a original tree 𝜏: index of leaf nodes Impurity metric (gini, entropy or varience) • Pruning tree when a subtree’s metric is above a threshold. cited by PRML formula (14.31)
  • 13. Random Forest 8/10/2017Overview of Tree Algorithms 13 https://stat.ethz.ch/education/semesters/ss2012/ams/slides/v10.2.pdf
  • 14. Main ideas of Random Forest • Bootstrapping data • Random selection of features • Ensembling trees – Average – Majority voting 8/10/2017Overview of Tree Algorithms 14
  • 15. Random Forest as a Feature Selector Random Forest is difficult interpreted, but calculate some kind of feature importances. 8/10/2017Overview of Tree Algorithms 15 Gain-based importance Summing up gains on each split. (finally, normarizing all importances ) Above split, “Age” got 0.08 feature importance point.
  • 16. Random Forest as a Feature Selector 8/10/2017Overview of Tree Algorithms 16 Permutation-based importance Decreasing accuracy after permuting each column Target Feat. 1 Feat. 2 Feat. 3 Feat. 4 0 1 2 11 101 1 2 3 12 102 1 3 5 13 103 0 4 7 14 104 Original data Target Feat. 1 Feat. 2 Feat. 3 Feat. 4 0 1 5 11 101 1 2 7 12 102 1 3 2 13 103 0 4 3 14 104 Permuted data Accuracy: 0.8 Accuracy: 0.7 0.1 Down Feature 2’s importance is 0.1.
  • 17. Which importance is good ? 8/10/2017Overview of Tree Algorithms 17 Pros. Cons. Gain-based importance • No need additional computing • Implemented in scikit-learn • biased in favor of continuous variables and variables with many categories [Strobl+ 2008] Permutation-based importance • Good for correlated variables? • Need additional computing It is still a controversial issue. If you want to learn more, please check [Louppe+ 2013]
  • 18. Out-of-bag (OOB) Error In random forests, we can get an unbiased estimator of the test error without CV. 8/10/2017Overview of Tree Algorithms 18 Procedure to get OOB Error kth tree bootstraping Remains data (OOB data) All data Calucurate an error for the OOB data Averaging the OOB errors by each data Loop for constructing trees
  • 19. Scikit-learn options 8/10/2017Overview of Tree Algorithms 19 Parameter Description n_estimators number of tree criterion "gini" or "entropy" max_features The number of features to consider when looking for the best split max_depth The maximum depth of the tree min_samples_split The minimum number of samples required to split an internal node min_samples_leaf The minimum number of samples required to be at a leaf node min_weight_fraction_leaf The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. max_leaf_nodes Grow trees with max_leaf_nodes in best-first fashion. min_impurity_split Threshold for early stopping in tree growth. bootstrap Whether bootstrap samples are used when building trees. oob_score Whether to use out-of-bag samples to estimate the generalization accuracy. warm_start When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest. http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
  • 20. Gradient Boosting Tree (GBT) 8/10/2017Overview of Tree Algorithms 20
  • 21. Gradient Boosting Tree (GBT) The Elements of Statistical Learning 2nd edition, p. 359 8/10/2017Overview of Tree Algorithms 21 psedo-residual 1-demantional optimization for each leaf.
  • 22. Xgboost(eXtreme Gradient Boosting) • xgboost is one of the implementation of GBT. • Splitting criterion is different from the criterions I showed above. 8/10/2017Overview of Tree Algorithms 22 Loss function number of leaves xgboost also implemented l1 regularization. (we see later.) Splitting criterion directly derived from loss function is the biggest contribution of xgboost.
  • 23. Xgboost’s Split finding algorithms • xgboost is one of the implementation of GBT. • Splitting criterion is different from the criterions I showed above. 8/10/2017Overview of Tree Algorithms 23 Quadratic Approximation First order gradient: Second order gradient:
  • 24. Xgboost’s Split finding algorithms • xgboost is one of the implementation of GBT. • Splitting criterion is different from the criterions I showed above. 8/10/2017Overview of Tree Algorithms 24 Solve the minimal point by isolating w Gain of this criterion when a node splits to 𝐿 𝐿 and 𝐿 𝑅 This is the xgboost’s splitting criterion.
  • 25. Xgboost’s Split finding algorithms 8/10/2017Overview of Tree Algorithms 25
  • 26. Xgboost’s Split finding algorithms for sparse data 8/10/2017Overview of Tree Algorithms 26
  • 28. Parameters of xgboost • eta [default=0.3, range: [0,1]] – step size shrinkage used in update to prevents overfitting. After each boosting step, we can directly get the weights of new features. and eta actually shrinks the feature weights to make the boosting process more conservative. 8/10/2017Overview of Tree Algorithms 28 https://github.com/dmlc/xgboost/blob/master/doc/parameter.md Updating of shrinkage 𝜂 • gamma [default=0, range: [0,∞]] – minimum loss reduction required to make a further partition on a leaf node of the tree. the larger, the more conservative the algorithm will be. If gamma is big enough, this term will be minus. (it does not cause a split)
  • 29. Parameters of xgboost 8/10/2017Overview of Tree Algorithms 29 • max_depth [default=6, range: [1,∞]] – maximum depth of a tree, increase this value will make model more complex / likely to be overfitting. • min_child_weight [default=1, range: [0,∞]] – minimum sum of instance weight(hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further partitioning. In linear regression mode, this simply corresponds to minimum number of instances needed to be in each node. The larger, the more conservative the algorithm will be. sum of instance hessian in leaf j < min_child_weightIf , then stop partitioning.
  • 30. Parameters of xgboost • max_delta_step [default=0, range: [0,∞]] – Maximum delta step we allow each tree's weight estimation to be. If the value is set to 0, it means there is no constraint. If it is set to a positive value, it can help making the update step more conservative. Usually this parameter is not needed, but it might help in logistic regression when class is extremely imbalanced. Set it to value of 1-10 might help control the update 8/10/2017Overview of Tree Algorithms 30 If > max_delta_step , then max_delta_step ? I am not sure, please someone tells me.
  • 31. Parameters of xgboost 8/10/2017Overview of Tree Algorithms 31 • subsample [default=1, range: (0,1]] – subsample ratio of the training instance. Setting it to 0.5 means that XGBoost randomly collected half of the data instances to grow trees and this will prevent overfitting. • colsample_bylevel [default=1, range: (0,1]] – subsample ratio of columns for each split, in each level. • colsample_bytree [default=1, range: (0,1]] – subsample ratio of columns when constructing each tree.
  • 32. Parameters of xgboost 8/10/2017Overview of Tree Algorithms 32 • lambda [default=1] – L2 regularization term on weights, increase this value will make model more conservative. • alpha [default=1] – L1 regularization term on weights, increase this value will make model more conservative. https://www.kaggle.com/forums/f/15/kaggle-forum/t/24181/xgboost-alpha-parameter/138272 https://github.com/dmlc/xgboost/blob/v0.60/src/tree/param.h#L178
  • 33. Parameters of xgboost Please see Algorithm 1 and Algorithm 2. 8/10/2017Overview of Tree Algorithms 33 • tree_method [default='auto'] – The tree construction algorithm used in XGBoost(see description in the reference paper) – Distributed and external memory version only support approximate algorithm. – Choices: {'auto', 'exact', 'approx'} – 'auto': Use heuristic to choose faster one. • For small to medium dataset, exact greedy will be used. • For very large-dataset, approximate algorithm will be chosen. • Because old behavior is always use exact greedy in single machine, user will get a message when approximate algorithm is chosen to notify this choice. – 'exact': Exact greedy algorithm. – 'approx': Approximate greedy algorithm using sketching and histogram. • sketch_eps [default=0.03, range: (0, 1)] – This is only used for approximate greedy algorithm. – This roughly translated into O(1 / sketch_eps) number of bins. Compared to directly select number of bins, this comes with theoretical guarantee with sketch accuracy. – Usually user does not have to tune this. but consider setting to a lower number for more accurate enumeration.
  • 34. I am not sure the parameter, but the main developer also said Parameters for early stopping 8/10/2017Overview of Tree Algorithms 34 • updater_seq, [default="grow_colmaker,prune"] – A comma separated string mentioning The sequence of Tree updaters that should be run. A tree updater is a pluggable operation performed on the tree at every step using the gradient information. Tree updaters can be registered using the plugin system provided. https://github.com/dmlc/xgboost/issues/1732 • num_round – The number of rounds for boosting It counterparts of “n_estimator” in scikit-learn API.
  • 35. Parameters for early stopping 8/10/2017Overview of Tree Algorithms 35 • early_stopping_rounds – Activates early stopping. Validation error needs to decrease at least every <early_stopping_rounds> round(s) to continue training. Requires at least one item in evals. If there’s more than one, will use the last. Returns the model from the last iteration (not the best one). If early stopping occurs, the model will have three additional fields: bst.best_score, bst.best_iteration and bst.best_ntree_limit. (Use bst.best_ntree_limit to get the correct value if num_parallel_tree and/or num_class appears in the parameters) • feval – Customized evaluation function def sample_feval(preds, dtrain): labels = dtrain.get_label() some_metric = calc_sume_metric(preds, labels) return 'MCC', some_metric sample feval If you have a validation set, you can tune boosting round. https://github.com/dmlc/xgboost/blob/master/demo/guide-python/custom_objective.py
  • 36. DART [2015 Rashmi+] • Employing dropouts technique to GBT (MART) • DART prevents over-specialization. – Trees added at early have too much contribution to predict – Shrinkage also prevents over-specialization, but the authors claim not enough. 8/10/2017Overview of Tree Algorithms 36 DART(Dropouts meet Multiple Additive Regression Trees)
  • 37. DART [2015 Rashmi+] 8/10/2017Overview of Tree Algorithms 37 Deciding which trees are dropped Calcurating psedo-residual Reducing the weights of dropped trees
  • 38. Parameters for DART at xgboost 8/10/2017Overview of Tree Algorithms 38 • normalize_type [default="tree"] – type of normalization algorithm. – "tree": new trees have the same weight of each of dropped trees. • weight of new trees are 1 / (k + learning_rate) • dropped trees are scaled by a factor of k / (k + learning_rate) – "forest": new trees have the same weight of sum of dropped trees (forest). • weight of new trees are 1 / (1 + learning_rate) • dropped trees are scaled by a factor of 1 / (1 + learning_rate) • sample_type [default="uniform"] – type of sampling algorithm. – "uniform": dropped trees are selected uniformly. – "weighted": dropped trees are selected in proportion to weight. • rate_drop [default=0.0, range: [0.0, 1.0]] – dropout rate. • skip_drop [default=0.0, range: [0.0, 1.0]] – probability of skip dropout. • If a dropout is skipped, new trees are added in the same manner as gbtree.