SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
1 
Heuristic Design of Experiments 
with Meta-Gradient Search 
of Model Training Parameters 
SF Bay ACM, Data Mining SIG, Feb 28, 2011 
http://www.sfbayacm.org/?p=2464 
Greg_Makowski@yahoo.com 
www.LinkedIn.com/in/GregMakowski
Choice is good… 
But can be 
overwhelming 
2
Key Questions Discussed 
• You (a data miner) have many algorithms or 
libraries you can use, with many choices… 
– How to stay organized among all the choices? 
• Algorithm parameters 
• Adjustments in Cost vs. Profit (Type I vs. II error bias) 
• Metric selection (Lift if acting on top % vs. RMSE or ROC) 
• Ensemble Modeling, boosting, bagging, stacking 
• Data versions, preprocessing, trying new fields 
– How to plan, and learn as you go? 
– How simple should you stay ? 
– to keep descriptiveness vs. Occam’s Razor? 
3
Outline 
Model Training Parameters in SAS Enterprise Miner 
Tracking Conservative Results in a “Model Notebook” 
How to Measure Progress 
Meta-Gradient Search of Model Training Parameters 
How to Plan and dynamically adapt 
How to Describe Any Complex System – Sensitivity 
4
Enterprise Miner 
Sample Data Flow for a Project 
: 5 
(Boxes are expanded in later slides) 
Learning 
Tuning 
Validation 
Stratified 
Sampling
Type I vs. II Error Weights 
Profit-Loss Ratios 
6 
In the Data Source, 
NOT the Model Engines 
In other software, 
may use a weight field 
Need to stay organized 
regardless
Regression 
• It is always good to find the 
best linear solution early on 
– Like testing a null hypothesis: 
(linear vs. non-linear) problem 
• Can feed “score” or “residual 
error” as a source field into 
non-linear models 
7
Neural Net Architecture 
and Parameters 
8 
c 
c c 
c 
c 
c 
c 
c 
c 
c c 
field 1 
field 2 
$ 
c 
$ 
$ 
$ 
$ 
c 
c 
$ 
c 
c 
c 
RBF 
c 
c 
c 
$ $ 
c 
$ 
c 
$ 
c 
c 
c 
$ $ $$ 
A Neural Net 
Solution 
“Non-Linear” 
Several 
regions 
which are 
not adjacent 
MLP
A Comparison of a Neural Net 
and Regression 
Direct connect 
9 
A Logistic regression formula: 
Y = f( a0 + a1*X1 + a2*X2 + a3*X3) 
a* are coefficients 
Backpropagation, cast in a similar form: 
H1 = f(w0 + w1*I1 + w2*I2 + w3*I3) 
H2 = f(w4 + w5*I1 + w6*I2 + w7*I3) 
a0 
Y 
X1 X2 X3 
: 
Hn = f(w8 + w9*I1 + w10*I2 + w11*I3) 
O1 = f(w12 + w13*H1 + .... + w15*Hn) 
On = .... 
w* are weights, AKA coefficients 
I1..In are input nodes or input variables. 
H1..Hn are hidden nodes, which extract features of the data. 
O1..On are the outputs, which group disjoint categories. 
f() is the SIGMOID function, a non-linear “S” curve 
a1 a2 a3 
Output 
H1 Hidden 2 
w1 
w2 
w3 
Input 1 I2 I3 
Bias 
it is very noisy in the brain – chemical depletion of neurotransmitters
Neural Net 
• Network  Architecture can be linear 
(MLP) or circular (many RBF) 
• Network  Direct Connection allows 
inputs to connect to output (to find the 
simple, linear solution first) 
• Network  Hidden Units can go up to 
64 (much better than 8) 
• Profit/Loss uses settings in Data Source 
10
Tree 
Depth = 2 
11 
What does a DecisionTree Look Like? 
Split 3 
Age 
Income 
$ 
Split 1 
Split 2 
$ 
$$ 
$ 
Leaf 3 
$ 
$ $ 
$ 
$ 
$ $ $ 
$ $ 
$ 
$ 
$ 
$ 
$ 
c 
c c 
c 
c 
c 
c 
c 
c 
c 
c 
$ c 
Leaf 4 
Leaf 1 
Leaf 2 
Split 2 Split 3 
Leaf 1 
Split 1 
Leaf 2 Leaf 3 Leaf 4 
If (Age < Split1) then 
:…If (Income > Split2) then Leaf1 with dollar_avg1 
:…If (Income < Split2) then Leaf2 with dollar_avg2 
If (Age > Split1) then 
:…If (Income > Split3) then Leaf3 with dollar_avg3 
:…If (Income < Split3) then Leaf4 with dollar_avg4
Decision Tree 
• Primary Parameters to vary 
– Criterion 
• Probchisq (Default) 
• Entropy 
• Gini 
– Assessment (Decision vs. Lift) 
– Tree size (depth, leaf size, Xvalid) 
12
Gradient Boosting 
(Tree Based) 
Based on “Greedy Function 
Approximation: A Gradient 
Boosting Machine” by Jerome 
Friedman 
Each new CART tree: 
• is on a 60% random sample 
• Is a small, general tree 
• Forecasts the error from the forecast 
from all previous trees summed 
• May have 50 to 2,000 trees in a 
sequence 
• Evaluate how far “back” in sequence 
to prune 
13
DM Algorithms Available in Packages 
14 
# Modules per Forecasting Family in DM Software 
Regres - 
s ion 
Las s o 
Reg 
Decis ion 
Tree 
Neural 
Net 
Support 
Vector 
Mach 
Other TOT 
2 1 0 0 0 1 4 
0 0 1 0 0 0 1 
3 0 3 3 0 3 12 
1 0 1 0 1 1 4 
0 0 4 0 0 0 4 
3 2 5 3 2 3 18 
0 0 0 0 0 5 5
Feel Overwhelmed on Lots of Complex 
Algorithm Parameters? GOOD! 
• A deep understanding of algorithms, math and 
assumptions helps significantly  Heuristics 
– i.e. typically, regression has a problem with correlating 
inputs because the solution calculation uses matrix 
inversion (if you are worried about weight sign inversion) 
– SVM’s or Bayesian Nets do not have this problem, 
because they are solved differently. 
• Don’t have a problem with correlating inputs, input selection 
becomes more random – but you still get a decent solution 
• How can you manage the details? 
– I am glad you asked…. Moving on to the next section 
15
Outline 
Model Training Parameters in SAS Enterprise Miner 
Tracking Conservative Results in a “Model Notebook” 
How to Measure Progress 
Meta-Gradient Search of Model Training Parameters 
How to Plan and dynamically adapt 
How to Describe Any Complex System – Sensitivity 
16
Model Exploration Process 
• Scientific Method of 
Hypothesis  Test 
– If you change ONE thing, than any change 
in the results is because of that one 
change 
– Design of Experiments (DOE), test plan 
– Best to compare model settings on same 
data version 
• New data versions add new preprocessed fields, 
or new months (records) 
– Key design objective: all experiments are 
reproducible 
• SAME Random split between Learning – Test – 
Validation, with a consistent random seed 
– LTV split before loading data in a tool, so same 
partitioning for all tools/libraries/algorithms
Model Notebook 
Input Parameters Outcomes 
Lift in Top 10% 
Train Val 
18 
Gap = 
Abs( 
Trn-Val) 
Consrv 
Result 
Param 
1 
vars 
offerd 
Param 
2 
var 
selct 
Param 
3 
… 
Vars 
Seltd 
Trn 
Time 
Data 
Ver 
Algor 
Mod 
Num 
1 Regrsn 1 27 stepw 9 12 5.77 5.94 0.17 5.60 
vars 
offerd 
Hidn 
Nodes 
Direct 
Conn 
Arch 
Bad vs. Good 
1 Neural 1 27 3 n MLP all 77 6.65 10.89 4.24 2.41 
1 Neural 2 27 10 n MLP all 40 6.88 6.73 0.15 6.58 
1 Neural 3 27 10 Y MLP all 36 6.40 6.93 0.53 5.87 
1 Neural 4 27 10 n RBF all 34 5.67 5.54 0.13 5.41 
1 Neural 5 27 10 Y RBF all 35 5.95 7.92 1.97 3.98
Model Notebook 
Outcome Details 
• My Heuristic Design Objectives: (yours may be different) 
– Accuracy in deployment 
– Reliability and consistent behavior, a general solution 
• Use one or more hold-out data sets to check consistency 
• Penalize more, as the forecast becomes less consistent 
– No penalty for model complexity (if it validates consistently) 
• Let me drive a car to work, instead limiting me to a bike 
– Message for check writer 
– Don’t consider only Occam’s Razor: value consistent good results 
– Develop a “smooth, continuous metric” to sort and find 
models that perform “best” in future deployment 
19
Model Notebook 
Outcome Details 
• Training = results on the training set 
• Validation = results on the validation hold out 
• Gap = abs( Training – Validation ) 
A bigger gap (volatility) is a bigger concern for deployment, a symptom 
Minimize Senior VP Heart attacks! (one penalty for volatility) 
Set expectations & meet expectations 
Regularization helps significantly 
• Conservative Result 
= worst( Training, Validation) + Gap_penalty 
Corr / Lift / Profit  higher is better: Cons Result = min(Trn, Val) - Gap 
MAD / RMSE / Risk  lower is better: Cons Result = max(Trn, Val) + Gap 
Business Value or Pain ranking = function of( conservative result2 0)
Model Notebook 
Input Parameters Outcomes 
Lift in Top 10% 
Train Val 
21 
Gap = 
Abs( 
Trn-Val) 
Consrv 
Result 
Param 
1 
vars 
offerd 
Param 
2 
var 
selct 
Param 
3 
… 
Vars 
Seltd 
Trn 
Time 
Data 
Ver 
Algor 
Mod 
Num 
1 Regrsn 1 27 stepw 9 12 5.77 5.94 0.17 5.60 
vars 
offerd 
Hidn 
Nodes 
Direct 
Conn 
Arch 
Bad vs. Good 
1 Neural 1 27 3 n MLP all 77 6.65 10.89 4.24 2.41 
1 Neural 2 27 10 n MLP all 40 6.88 6.73 0.15 6.58 
1 Neural 3 27 10 Y MLP all 36 6.40 6.93 0.53 5.87 
1 Neural 4 27 10 n RBF all 34 5.67 5.54 0.13 5.41 
1 Neural 5 27 10 Y RBF all 35 5.95 7.92 1.97 3.98
Model Notebook Process 
Tracking Detail  Training the Data Miner 
Data 
Ver 
Aut 
hor 
Input / Test Outcome 
Algor 
Mod 
Num 
chng 
from 
prior 
Model Notebook 
Project = Transit, Last Update 5/6/2010 
Input Parameters Outcomes 
Param 1 Param 2 Param 3 Param 4 Param 5 Param 6 Param 7 
Status Lift in Top 10% Over File Avg 
Var 
Sel 
Trn 
time 
(sec) 
Lift in Top 5% Over File Avg 
Top 
5% 
Train Val 
Gap = 
Abs( 
Trn-Val) 
Consrv 
Result 
Outcomes 
Top 
10% 
Train Val 
Gap = 
Abs( 
Trn-Val) 
Consrv 
Result 
Outcomes 
Lift in Top 20% Over File Avg 
Top 
20% 
Train Val 
Gap = 
Abs( 
Trn-Val) 
Consrv 
Result 
Data 
Ver 
Aut 
hor 
Algor 
Mod 
Num 
chng 
from 
prior 
vars 
offered 
var 
selectn 
Var 
Sel 
Trn 
Time 
Train Val Gap 
Consrv 
Result 
Train Val Gap 
Consrv 
Result 
Train Val Gap 
Consrv 
Result 
1 GM B logistic 1 0 27 stepws 10 12.04 8.12 3.92 4.20 7.59 4.85 2.74 2.11 
1 GM B logistic 2 1 19 stepws 10 12.04 8.12 3.92 4.20 7.59 4.85 2.74 2.11 
1 GM B logistic 3 1 6, no dbc stepws 4 7.51 1.98 5.53 -3.55 4.90 3.96 0.94 3.02 investigate inconsistency 
1 GM B logistic 4 1 
13, only 
dbc 
stepws 7 9.58 7.33 2.25 5.08 6.59 5.25 1.34 3.91 
Data 
Ver 
Aut 
hor 
Algor 
Mod 
Num 
chng 
from 
prior 
vars 
offered 
regr type 
var 
selectn 
2-factor 
interact 
polynom 
Var 
Sel 
Trn 
Time 
Train Val Gap 
Consrv 
Result 
Train Val Gap 
Consrv 
Result 
Train Val Gap 
Consrv 
Result 
Regression 
1 GM regr 1 0 27 logistic stepws n 9 12 5.77 5.94 0.17 5.60 3.35 4.46 1.11 2.24 2.25 3.02 0.77 1.48 
1 GM regr 2 1 27 logistic stepws Yes 9 16 5.76 5.94 0.18 5.58 3.35 4.46 1.11 2.24 2.25 3.02 0.77 1.48 
1 GM regr 3 1 27 logistic stepws n 2 10 57 5.86 6.93 1.07 4.79 3.48 5.03 1.55 1.93 2.32 2.61 0.29 2.03 
1 GM regr 4 1 27 logistic stepws Yes 2 11 58 5.86 6.93 1.07 4.79 3.48 5.04 1.56 1.92 2.32 2.92 0.60 1.72 
4 GM regr 5 4 3 logistic stepwise Yes 2 8 63 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.43 
4 GM regr 6 5 28 logistic stepwise Yes 2 
didn't finish, out of memory 
4 GM regr 7 5 3 logistic stepwise n 2 63 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.43 
4 GM regr 8 5 3 logistic stepwise n 1 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.43 
4 GM regr 9 5 3 logistic stepwise Yes 1 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.43 
4 GM regr 10 8 28 logistic stepwise n 1 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.43 
4 GM regr 11 5 3 logistic stepwise Yes 3 6 78 15.98 16.06 0.08 15.89 8.61 8.03 0.58 7.45 4.81 4.39 0.41 3.98 
4 GM regr 12 5 3 logistic stepwise Yes 4 2 78 15.98 16.06 0.08 15.89 8.61 8.03 0.58 7.45 4.81 4.39 0.41 3.98 
add Feb & Mar to 
4n GM regr 13 11 3 logistic stepwise Yes 3 6 78 18.39 18.79 0.39 18.00 9.58 9.55 0.03 9.52 4.96 4.92 0.03 4.89 
recent* 
recent_serrtrn_dbc changed to recent_serrtrn_flag 
4n GM regr 14 11 3 6 78 12.49 12.12 0.36 11.76 7.63 7.42 0.20 7.22 4.29 4.47 0.18 4.12 
(does DBC on ser patt help? YES) 
Yippeee! 
1 GM DM Regr 1 0 27 logistic stepws 13 15 12.00 3.17 8.83 -5.66 7.21 4.16 3.05 1.11 4.28 3.07 1.21 1.86 
4 GM DM Regr 2 0 28 
max v 
3000 
min rsq 
0.005 
use 
aov16 var 
YES 
6 72 16.27 15.76 0.52 15.24 8.67 8.03 0.64 7.39 4.58 4.24 0.34 3.90 
1 GM PLS 1 0 
1 GM PLS 2 1 27 default default default default 4 18 11.26 3.08 8.18 -5.10 7.12 4.85 2.27 2.58 4.28 3.12 1.16 1.96 
1 GM PLS 3 1 Test Set Cros Val didn't finish, don't use Xvalidation 
4 GM PLS 4 0 28 PLS NIPALS 200 28 122 16.63 15.76 0.87 14.89 8.93 8.03 0.90 7.13 4.76 4.32 0.45 3.87 
Data 
Ver 
Aut 
hor 
Algor 
Mod 
Num 
chng 
from 
prior 
vars 
offered 
hidden 
Direct 
Conn ? 
arch 
Var 
Sel 
Trn 
Time 
Train Val Gap 
Consrv 
Result 
Train Val Gap 
Consrv 
Result 
Train Val Gap 
Consrv 
Result 
1 GM AutoNrl 1 0 27 2 n MLP all 35 4.19 3.76 0.43 3.33 2.47 2.57 0.10 2.37 1.77 1.88 0.11 1.66 
1 GM AutoNrl 2 1 27 6 n MLP all 189 4.37 2.77 1.60 1.17 2.82 1.78 1.04 0.74 1.98 1.93 0.05 1.88 
1 GM AutoNrl 3 1 27 8 n MLP 
AutoNeural 
trn action 
= search 
all 532 0.83 0.56 0.27 0.29 0.83 0.56 0.27 0.29 0.83 0.56 0.27 0.29 
1 GM AutoNrl 4 1 27 8 n MLP 
activ = 
logistic 
all 356 5.12 2.97 2.15 0.82 3.02 3.37 0.35 2.67 1.90 2.57 0.67 1.23 
1 GM AutoNrl 5 1 27 6 n MLP 
arch = 
block 
all 130 0.89 0.97 0.08 0.81 
1 GM AutoNrl 6 1 27 6 n MLP 
arch = 
funnel 
all 595 1.36 1.08 0.28 0.80 
4 GM AutoNrl 7 1 28 6 n MLP all 1201 16.2722 15.76 0.51 15.24 8.65 7.88 0.77 7.11 4.46 4.24 0.22 4.03 
Data 
Ver 
Aut 
hor 
Algor 
Mod 
Num 
chng 
from 
prior 
vars 
offered 
hidden 
Direct 
Conn ? 
arch Decay 
Decision 
Weight 
Var 
Sel 
Trn 
Time 
Train Val Gap 
Consrv 
Result 
Train Val Gap 
Consrv 
Result 
Train Val Gap 
Consrv 
Result 
1 GM Neural 1 0 27 3 n MLP all 77 6.65 10.89 4.24 2.41 3.90 6.53 2.63 1.27 2.52 3.96 1.44 1.08 
1 GM Neural 2 1 27 10 n MLP all 40 6.88 6.73 0.15 6.58 3.97 4.55 0.58 3.39 2.56 3.02 0.46 2.10 
1 GM Neural 3 1 27 10 Y MLP all 36 6.40 6.93 0.53 5.87 3.49 5.45 1.96 1.53 2.32 3.22 0.90 1.42 
1 GM Neural 4 1 27 10 n RBF (orbfeq) all 34 5.67 5.54 0.13 5.41 3.25 4.85 1.60 1.65 2.20 3.22 1.02 1.18 
1 GM Neural 5 1 27 10 Y RBF all 35 5.95 7.92 1.97 3.98 3.48 4.85 1.37 2.11 2.31 3.17 0.86 1.45 
js1 JS Neural 6 0 17 5 n MLP Softmax 10,-5,-1,0 all 6.03 6.53 0.50 5.53 3.40 4.55 1.15 2.25 2.67 3.36 0.69 1.98 
js1 JS Neural 7 6 15 5 Y MLP Softmax 10,-5,-1,0 all 6.14 5.74 0.40 5.34 3.59 2.97 0.62 2.35 2.77 2.37 0.40 1.97 
js1 JS Neural 8 6 15 3 Y MLP Softmax 0.5 10,-5,-1,0 all 6.27 7.13 0.86 5.41 3.54 3.56 0.02 3.52 2.74 2.57 0.17 2.40 
js1 JS Neural 9 6 15 3 n MLP Softmax 0.5 10,-5,-1,0 all 6.27 6.33 0.06 6.21 3.57 4.65 1.08 2.49 2.76 2.82 0.06 2.70 
2 GM Neural 10 2 35 12 Y MLP 20,0,-1,0 all 
3 GM Neural 11 2 45 20 n MLP 20,0,-1,0 all 18 6.26 7.76 1.50 4.76 3.54 4.22 0.68 2.86 2.18 2.46 0.28 1.91 
3 GM Neural 12 11 45 20 n MLP 0.8 20,0,-1,0 all 16 6.26 7.76 1.50 4.76 3.54 4.22 0.68 2.86 2.18 2.46 0.28 1.91 
3 GM Neural 13 11 45 20 n MLP 0.6 20,0,-1,0 all 16 6.26 7.76 1.50 4.76 3.54 4.22 0.68 2.86 2.18 2.46 0.28 1.91 
4 GM Neural 14 11 3 20 n MLP 0.01 20,0,-1,0 all 204 16.39 15.15 1.24 13.91 8.67 8.03 0.64 7.39 4.82 4.39 0.43 3.97 
4 GM Neural 15 11 28 20 n MLP 0.01 20,0,-1,0 all 713 16.39 15.76 0.63 15.12 8.54 7.88 0.66 7.22 4.40 4.25 0.15 4.11 
4 GM Neural 16 15 31 40 n MLP 0.01 20,0,-1,0 all 782 18.02 18.18 0.16 17.86 9.21 9.55 0.34 8.87 4.60 4.77 0.17 4.44 
4 GM Neural 17 15 same, max iter 20 --> 50 all 1754 18.02 18.18 0.16 17.86 9.21 9.55 0.34 8.87 4.66 4.77 0.11 4.55 
4 GM Neural 18 16 
29 (no 
twoYr) 
same, max iter 20 --> 50 
Neural 
40 0 0 all 18.386 18.98 18.18 0.80 17.38 9.25 9.59 0.34 8.90 4.67 4.86 0.20 4.47 
4n GM DMNeural 19 0 13 3 n all 19 10.60 2.57 8.03 -5.46 6.93 4.36 2.57 1.79 4.14 2.57 1.57 1.00 
More 
Heuristic Strategy: 
1) Try a few models of many 
algorithm types (seed the 
search) 
2) Opportunistically spend 
more effort on what is 
working (invest in top stocks) 
3) Still try a few trials on 
medium success (diversify, 
limited by project time-box) 
4) Try ensemble methods, 
combining model forecasts 
& top source vars w/ 
The Data Mining Battle Field model
Model Notebook Process 
Tracking Detail  Training the Data Miner 
M 
cnt 
Data 
Ver 
Aut 
hor 
Algor 
Mod 
Num 
chng 
from 
prior 
vars 
offered 
criterion 
max 
depth 
leaf size 
asses = 
5% Lift 
Decision 
Weight 
Var 
Sel 
Trn 
Time 
Train Val Gap 
Consrv 
Result 
Train Val Gap 
Consrv 
Result 
Train Val Gap 
Consrv 
Result 
47 1 GM Dec Tree 1 0 27 default 6 5 20,0,-5,0 7 13 13.71 9.59 4.12 5.47 7.67 5.35 2.32 3.03 4.33 3.80 0.53 3.27 
48 1 GM Dec Tree 2 1 27 probchisq 6 5 20,0,-5,0 7 16 13.71 9.59 4.12 5.47 7.67 5.35 2.32 3.03 4.33 3.80 0.53 3.27 
49 1 GM Dec Tree 3 1 27 entropy 6 5 20,0,-5,0 6 16 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 
50 1 GM Dec Tree 4 1 27 gini 6 5 20,0,-5,0 10 22 13.76 11.28 2.48 8.80 7.70 6.10 1.60 4.50 4.32 3.71 0.61 3.10 
51 1 GM Dec Tree 5 3 27 entropy 12 5 20,0,-5,0 6 13 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 
52 1 GM Dec Tree 6 3 27 entropy 6 10 20,0,-5,0 6 13 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 
53 1 GM Dec Tree 7 3 27 entropy 6 100 20,0,-5,0 6 17 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 
54 1 GM Dec Tree 8 3 27 entropy 6 100 xval = Y 20,0,-5,0 8 32 14.51 12.82 1.69 11.13 8.95 7.42 1.53 5.89 4.72 4.13 0.59 3.54 
55 1 GM Dec Tree 9 3 27 entropy 6 5 xval = Y 20,0,-5,0 8 32 14.51 12.82 1.69 11.13 8.95 7.42 1.53 5.89 4.72 4.13 0.59 3.54 
56 1 GM Dec Tree 10 3 27 entropy 6 5 
obs 
import = 
Y 
DecisionTree 
Data Version 1 
20,0,-5,0 6 17 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 
57 1 GM Dec Tree 11 3 27 entropy 6 5 
asses = 
5% Lift 
20,0,-5,0 6 12 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 
58 1 GM Dec Tree 12 3 27 entropy 10 2 20,0,-5,0 6 12 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 
46 2 GM Dec Tree 13 3 33 entropy 6 5 a=5% lift 20,0,-5,0 7 16 15.92 14.96 0.96 14.00 8.29 7.84 0.45 7.39 4.40 4.17 0.23 3.94 
47 2 GM Dec Tree 14 13 33 entropy 6 5 a=5% lift 10,-2.5,-1,0 13 15 16.32 15.05 1.27 13.78 9.07 8.00 1.07 6.93 4.63 4.08 0.55 3.53 
48 2 GM Dec Tree 15 13 33 entropy 6 5 a=5% lift 1,-1,1,-1 8 15 15.30 14.34 0.96 13.38 7.98 7.53 0.45 7.08 4.25 4.05 0.20 3.85 
49 2 GM Dec Tree 16 13 33 entropy 6 5 a=5% lift 10,-1,1,-1 12 16 16.32 15.05 1.27 13.78 8.96 8.14 0.82 7.32 4.62 4.23 0.39 3.84 
50 2 GM Dec Tree 17 13 33 entropy 6 5 a=5% lift 20,-5,0,0 12 15 16.32 15.60 0.72 14.88 8.79 8.26 0.53 7.73 4.47 4.21 0.26 3.95 
51 2 GM Dec Tree 18 13 33 entropy 6 5 a=5% lift 20,-1,0,0 12 15 16.32 15.60 0.72 14.88 8.79 8.26 0.53 7.73 4.47 4.21 0.26 3.95 
52 2 GM Dec Tree 19 13 33 entropy 6 5 a=5% lift xval = no DecisionTree 
20,0,-1,0 6 15 15.87 15.52 0.35 15.17 8.26 8.12 0.14 7.98 4.40 4.32 0.08 4.24 
53 2 GM Dec Tree 20 13 33 entropy 6 5 a=5% lift 20,-5,-1,1 12 16 16.32 15.05 1.27 13.78 8.96 8.14 0.82 7.32 4.62 4.23 0.39 3.84 
54 2 GM Dec Tree 21 13 33 entropy 6 5 a=5% lift xval = no 20,0,0,1 9 16 16.17 15.57 0.60 14.97 8.74 8.25 0.49 7.76 4.44 4.21 0.23 3.98 
55 2 GM Dec Tree 22 19 33 gini 6 5 a=5% lift 20,0,-1,0 8 16 15.17 13.17 2.00 56 2 GM Dec Tree 23 19 33 probchisq 6 5 a=5% lift 20,0,-1,0 8 16 15.17 13.17 2.00 57 2 GM Dec Tree 24 19 33 entropy 20 5 a=5% lift Data 20,0,-1,0 19 Version 26 18.94 15.42 3.52 2 
11.17 8.02 7.32 0.70 6.62 4.40 4.26 0.14 4.12 
11.17 8.02 7.32 0.70 6.62 4.40 4.26 0.14 4.12 
11.90 9.67 7.78 1.89 5.89 4.90 4.06 0.84 3.22 
58 2 GM Dec Tree 25 19 33 entropy 20 20 a=5% lift 20,0,-1,0 19 26 18.94 13.80 5.14 8.66 9.67 7.78 1.89 5.89 4.90 4.06 0.84 3.22 
59 2 GM Dec Tree 26 19 33 entropy 20 40 a=5% lift 20,0,-1,0 7 27 16.06 15.29 0.77 14.52 8.36 8.00 0.36 7.64 4.41 4.23 0.18 4.05 
60 2 GM Dec Tree 27 19 33 entropy 20 60 a=5% lift 20,0,-1,0 7 27 16.06 15.29 0.77 14.52 8.36 8.00 0.36 7.64 4.41 4.23 0.18 4.05 
61 2 GM Dec Tree 28 19 33 entropy 7 5 a=5% lift 20,0,-1,0 10 33 16.73 14.57 2.16 12.41 8.90 7.75 1.15 6.60 4.60 4.06 0.54 3.52 
62 2 GM Dec Tree 29 19 33 entropy 7 10 a=5% lift 20,0,-1,0 10 33 16.73 14.57 2.16 12.41 8.90 7.75 1.15 6.60 4.60 4.06 0.54 3.52 
63 2 GM Dec Tree 30 19 33 entropy 7 20 a=5% lift 20,0,-1,0 7 37 16.04 14.66 1.38 13.28 8.35 7.69 0.66 7.03 4.41 4.07 0.34 3.73 
64 2 GM Dec Tree 31 19 35 entropy 7 40 a=5% lift itmledratio 
itm_to_led 20,0,-1,0 7 36 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14 
65 2 GM Dec Tree 32 19 35 entropy 7 60 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14 
66 2 GM Dec Tree 33 19 35 entropy 7 80 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14 
67 2 GM Dec Tree 34 19 35 entropy 7 100 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14 
68 2 GM Dec Tree 35 19 35 entropy 7 150 a=5% lift 20,0,-1,0 5 37 14.53 13.08 1.45 11.63 7.75 7.19 0.56 6.63 4.36 4.29 0.07 4.22 
64 2 GM Dec Tree 36 19 35 entropy 6 5 a=5% lift 20,0,-1,0 7 29 15.91 14.95 0.96 13.99 8.29 7.83 0.46 7.37 4.40 4.17 0.23 3.94 
ex=20k 
node s mp 
= 30k 
65 2 GM Dec Tree 37 19 
14, raw 
only 
entropy 6 5 a=5% lift 0 20,0,-1,0 7 16 13.92 11.81 2.11 9.69 7.46 6.54 0.93 5.61 4.24 3.91 0.33 3.57 
5.28 2.15 0.41 
improvement gain in Conservative Lift from new variables (vs. DecTree-d2-m19) 
66 3 GM Dec Tree 38 19 45 entropy 8 5 a=5% lift xval = no 20,0,-5,1 3 39 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.58 
67 3 GM Dec Tree 39 38 45 gini 8 5 a=5% lift xval = no 20,0,-5,1 3 71 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.58 
68 3 GM Dec Tree 40 38 45 propchi 8 5 a=5% lift xval DecisionTree 
= no 20,0,-5,1 3 42 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.58 
69 3 GM Dec Tree 41 38 45 entropy 20 5 a=5% lift subtr= 20,0,-5,1 33 91 20.00 14.81 5.19 9.61 10.00 7.54 2.46 5.08 5.00 3.90 1.10 2.80 
70 3 GM Dec Tree 42 38 45 entropy 20 100 a=5% lift sub=lrg 20,0,-5,1 25 70 19.09 16.25 2.84 13.42 10.00 8.17 1.83 6.35 5.00 4.19 0.81 3.38 
71 3 GM Dec Tree 43 38 45 entropy 20 200 a=5% lift sub=lrg 20,0,-5,1 23 64 17.67 16.67 1.01 72 3 GM Dec Tree 44 38 45 entropy 20 400 a=5% lift sub=lrg 20,0,-5,1 21 59 15.87 17.08 1.21 73 3 GM Dec Tree 45 38 45 entropy 20 800 a=5% lift Data sub=lrg 20,0,-5,1 Version 16 52 14.35 16.16 1.81 3 
15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67 
14.67 9.02 8.96 0.06 8.89 4.97 4.69 0.28 4.41 
12.53 8.46 8.96 0.50 7.96 4.78 4.79 0.01 4.78 
74 3 GM Dec Tree 46 38 45 entropy 20 1600 a=5% lift sub=lrg 20,0,-5,1 16 47 14.25 16.02 1.78 12.47 8.26 8.59 0.34 7.92 4.58 4.42 0.17 4.25 
75 3 GM Dec Tree 47 38 45 entropy 20 3200 a=5% lift sub=lrg 20,0,-5,1 10 39 12.45 14.35 1.91 10.54 7.49 8.31 0.82 6.67 4.36 4.48 0.12 4.24 
76 3 GM Dec Tree 48 43 45 entropy 20 150 a=5% lift sub=lrg 20,0,-5,1 23 68 18.57 16.25 2.32 13.93 10.00 8.14 1.86 6.27 5.00 4.17 0.83 3.34 
77 3 GM Dec Tree 49 43 45 entropy 20 300 a=5% lift sub=lrg 20,0,-5,1 23 62 16.45 17.86 1.41 15.03 9.31 8.96 0.35 8.61 5.00 4.60 0.40 4.20 
78 3 GM Dec Tree 50 43 45 entropy 20 250 a=5% lift sub=lrg 20,0,-5,1 24 65 16.64 17.71 1.07 15.57 9.56 8.96 0.60 8.36 5.00 4.61 0.39 4.21 
79 3 GM Dec Tree 51 43 45 entropy 20 350 a=5% lift sub=lrg 20,0,-5,1 24 67 16.07 17.50 1.43 14.64 9.19 8.96 0.23 8.73 5.00 4.59 0.41 4.18 
80 3 GM Dec Tree 52 43 45 entropy 20 225 a=5% lift sub=lrg 20,0,-5,1 23 63 17.85 16.67 1.18 15.49 9.83 8.96 0.87 8.09 5.00 4.53 0.48 4.05 
81 3 GM Dec Tree 53 43 45 entropy 20 175 a=5% lift sub=lrg 20,0,-5,1 26 68 18.15 16.25 1.90 14.35 9.97 8.13 1.84 6.28 5.00 4.16 0.84 3.32 
82 3 GM Dec Tree 54 43 45 entropy 20 200 a=5% lift sub=lrg 20,0,-5.0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67 
83 3 GM Dec Tree 55 43 45 entropy 20 200 a=5% lift sub=lrg 20,0,-1,0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67 
84 3 GM Dec Tree 56 43 45 entropy 20 200 a=5% lift sub=lrg 20,-5,0,0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67 
85 4 GM Dec Tree 57 43 146 entropy 20 200 a=5% lift sub=lrg 20,0,-5,1 9 149 20.00 14.09 5.91 8.19 10.00 7.20 2.80 4.40 5.00 3.76 1.24 2.51 
86 4 GM Dec Tree 58 57 107 (tree settings the same, dropped INT* categorical vars, not DBC) 
18 115 20.00 16.09 3.91 12.18 10.00 8.15 1.85 6.29 5.00 4.18 0.82 3.35 
87 4 GM Dec Tree 59 57 107 entropy 20 500 a=5% lift sub=DecisionTree 
lrg 20,0,-5,1 13 110 19.46 14.79 4.68 10.11 10.00 7.64 2.36 5.29 5.00 3.95 1.05 2.91 
88 4 GM Dec Tree 60 57 107 entropy 20 1000 a=5% lift sub=lrg 20,0,-5,1 10 89 18.94 14.47 4.47 10.00 10.00 7.44 2.56 4.88 5.00 3.86 1.14 2.73 
89 4 GM Dec Tree 61 57 107 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 7 81 14.41 13.91 0.50 13.41 9.54 8.02 1.51 6.51 6.61 4.25 2.36 1.90 
90 4 GM Dec Tree 62 57 107 entropy 20 3000 a=5% lift sub=lrg 20,0,-5,1 5 71 9.89 7.91 1.98 5.94 8.74 6.39 2.35 4.04 5.00 3.70 1.30 2.40 
91 4 GM Dec Tree 63 57 107 entropy 20 1500 a=5% lift Data sub=lrg 20,0,-5,1 Version 9 60 16.17 14.66 1.50 4 
13.16 9.89 8.18 1.71 6.47 5.00 3.38 1.62 1.76 
92 4 GM Dec Tree 64 57 107 entropy 20 1750 a=5% lift sub=lrg 20,0,-5,1 7 60 15.23 14.32 0.92 13.40 9.68 8.07 1.61 6.46 5.00 4.26 0.75 3.51 
93 4 GM Dec Tree 65 57 107 entropy 20 2250 a=5% lift sub=lrg 20,0,-5,1 5 60 15.43 11.00 4.43 6.56 9.55 6.30 3.25 3.05 5.00 3.70 1.30 2.40 
94 4 GM Dec Tree 66 61 58 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 8 105 14.07 13.92 0.15 13.77 8.45 7.88 0.57 7.30 4.74 4.02 0.73 3.29 
95 4 GM Dec Tree 67 61 80 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 8 97 14.25 13.94 0.30 13.64 9.25 7.88 1.37 6.51 5.00 4.25 0.75 3.49 
96 4 GM Dec Tree 68 61 103 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 7 103 14.41 13.72 0.69 13.03 9.54 8.02 1.52 6.50 5.00 4.25 0.75 3.50 
interactions are getting selected, improve Trn results but 
decrease Val results. Perhaps I should regen the INT*dbc with a 
larger number of min records. 
More 
97 4n GM Dec Tree 69 61 3 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,0 7 14.61 15.54 0.93 13.68 8.83 8.99 0.16 8.67 4.88 4.73 0.15 4.58 
98 4n GM Dec Tree 70 0 20 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,0 10 11.50 11.12 0.38 10.74 7.08 7.29 0.21 6.87 4.24 3.94 0.30 3.64 
use RAW vars ONLY, to test value of my preprocessing 
M 
cnt 
Data 
Ver 
Aut 
hor 
Algor 
Mod 
Num 
chng 
from 
prior 
binary 
model 
cleanup 
model 
max num 
rips 
Var 
Sel 
Trn 
Time 
Train Val Gap 
Consrv 
Result 
Train Val Gap 
Consrv 
Result 
Train Val Gap 
Consrv 
Result 
94 1 GM Rule Ind 1 0 tree neural 16 32 10.77 9.92 0.85 9.07 6.28 5.60 0.68 4.92 3.35 3.09 0.26 2.83 
95 1 GM Rule Ind 2 1 regr neural 16 36 5.95 7.52 1.57 4.38 3.55 4.85 1.30 2.25 2.35 3.17 0.82 1.53 
96 1 GM Rule Ind 3 1 neural tree 16 121 5.95 7.92 1.97 3.98 3.52 5.64 2.12 1.40 2.34 3.31 0.97 1.37 
97 1 GM Rule Ind 4 3 neural tree 4 121 5.95 7.92 1.97 3.98 3.52 5.64 2.12 1.40 2.34 3.31 0.97 1.37 
98 1 GM Rule Ind 5 3 neural tree 32 121 5.95 7.92 1.97 3.98 3.53 5.64 2.11 1.42 2.34 3.32 0.98 1.36 
99 1 GM Rule Ind 6 1 tree neural 32 32 7.25 5.26 1.99 3.27 6.45 5.17 1.28 3.89 3.43 3.09 0.34 2.75 
“Agile Software Design” 
Get something simple, 
fully working and tested 
early on (Data Version 1) 
Data Version 2…4 
Working, incremental improvements 
Incremental complexity 
Different preprocessing 
Add more fields, records 
Add & test more 
complexity
Model Notebook Process 
Tracking Detail  Training the Data Miner 
M 
cnt 
Data 
Ver 
Aut 
hor 
Algor 
Mod 
Num 
chng 
from 
prior 
vars 
offered 
criterion 
max 
depth 
leaf size 
asses = 
5% Lift 
Decision 
Weight 
Var 
Sel 
Trn 
Time 
Train Val Gap 
Consrv 
Result 
Train Val Gap 
Consrv 
Result 
Train Val Gap 
Consrv 
Result 
47 1 GM Dec Tree 1 0 27 default 6 5 20,0,-5,0 7 13 13.71 9.59 4.12 5.47 7.67 5.35 2.32 3.03 4.33 3.80 0.53 3.27 
48 1 GM Dec Tree 2 1 27 probchisq 6 5 20,0,-5,0 7 16 13.71 9.59 4.12 5.47 7.67 5.35 2.32 3.03 4.33 3.80 0.53 3.27 
49 1 GM Dec Tree 3 1 27 entropy 6 5 20,0,-5,0 6 16 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 
50 1 GM Dec Tree 4 1 27 gini 6 5 20,0,-5,0 10 22 13.76 11.28 2.48 8.80 7.70 6.10 1.60 4.50 4.32 3.71 0.61 3.10 
51 1 GM Dec Tree 5 3 27 entropy 12 5 20,0,-5,0 6 13 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 
52 1 GM Dec Tree 6 3 27 entropy 6 10 20,0,-5,0 6 13 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 
53 1 GM Dec Tree 7 3 27 entropy 6 100 20,0,-5,0 6 17 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 
54 1 GM Dec Tree 8 3 27 entropy 6 100 xval = Y 20,0,-5,0 8 32 14.51 12.82 1.69 11.13 8.95 7.42 1.53 5.89 4.72 4.13 0.59 3.54 
55 1 GM Dec Tree 9 3 27 entropy 6 5 xval = Y 20,0,-5,0 8 32 14.51 12.82 1.69 11.13 8.95 7.42 1.53 5.89 4.72 4.13 0.59 3.54 
56 1 GM Dec Tree 10 3 27 entropy 6 5 
obs 
import = 
Y 
20,0,-5,0 6 17 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 
57 1 GM Dec Tree 11 3 27 entropy 6 5 
asses = 
5% Lift 
20,0,-5,0 6 12 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 
58 1 GM Dec Tree 12 3 27 entropy 10 2 20,0,-5,0 6 12 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 
46 2 GM Dec Tree 13 3 33 entropy 6 5 a=5% lift 20,0,-5,0 7 16 15.92 14.96 0.96 14.00 8.29 7.84 0.45 7.39 4.40 4.17 0.23 3.94 
47 2 GM Dec Tree 14 13 33 entropy 6 5 a=5% lift 10,-2.5,-1,0 13 15 16.32 15.05 1.27 13.78 9.07 8.00 1.07 6.93 4.63 4.08 0.55 3.53 
48 2 GM Dec Tree 15 13 33 entropy 6 5 a=5% lift 1,-1,1,-1 8 15 15.30 14.34 0.96 13.38 7.98 7.53 0.45 7.08 4.25 4.05 0.20 3.85 
49 2 GM Dec Tree 16 13 33 entropy 6 5 a=5% lift 10,-1,1,-1 12 16 16.32 15.05 1.27 13.78 8.96 8.14 0.82 7.32 4.62 4.23 0.39 3.84 
50 2 GM Dec Tree 17 13 33 entropy 6 5 a=5% lift 20,-5,0,0 12 15 16.32 15.60 0.72 14.88 8.79 8.26 0.53 7.73 4.47 4.21 0.26 3.95 
51 2 GM Dec Tree 18 13 33 entropy 6 5 a=5% lift 20,-1,0,0 12 15 16.32 15.60 0.72 14.88 8.79 8.26 0.53 7.73 4.47 4.21 0.26 3.95 
52 2 GM Dec Tree 19 13 33 entropy 6 5 a=5% lift xval = no 20,0,-1,0 6 15 15.87 15.52 0.35 15.17 8.26 8.12 0.14 7.98 4.40 4.32 0.08 4.24 
53 2 GM Dec Tree 20 13 33 entropy 6 5 a=5% lift 20,-5,-1,1 12 16 16.32 15.05 1.27 13.78 8.96 8.14 0.82 7.32 4.62 4.23 0.39 3.84 
54 2 GM Dec Tree 21 13 33 entropy 6 5 a=5% lift xval = no 20,0,0,1 9 16 16.17 15.57 0.60 14.97 8.74 8.25 0.49 7.76 4.44 4.21 0.23 3.98 
55 2 GM Dec Tree 22 19 33 gini 6 5 a=5% lift 20,0,-1,0 8 16 15.17 13.17 2.00 11.17 8.02 7.32 0.70 6.62 4.40 4.26 0.14 4.12 
56 2 GM Dec Tree 23 19 33 probchisq 6 5 a=5% lift 20,0,-1,0 8 16 15.17 13.17 2.00 11.17 8.02 7.32 0.70 6.62 4.40 4.26 0.14 4.12 
57 2 GM Dec Tree 24 19 33 entropy 20 5 a=5% lift 20,0,-1,0 19 26 18.94 15.42 3.52 11.90 9.67 7.78 1.89 5.89 4.90 4.06 0.84 3.22 
58 2 GM Dec Tree 25 19 33 entropy 20 20 a=5% lift 20,0,-1,0 19 26 18.94 13.80 5.14 8.66 9.67 7.78 1.89 5.89 4.90 4.06 0.84 3.22 
59 2 GM Dec Tree 26 19 33 entropy 20 40 a=5% lift 20,0,-1,0 7 27 16.06 15.29 0.77 14.52 8.36 8.00 0.36 7.64 4.41 4.23 0.18 4.05 
60 2 GM Dec Tree 27 19 33 entropy 20 60 a=5% lift 20,0,-1,0 7 27 16.06 15.29 0.77 14.52 8.36 8.00 0.36 7.64 4.41 4.23 0.18 4.05 
61 2 GM Dec Tree 28 19 33 entropy 7 5 a=5% lift 20,0,-1,0 10 33 16.73 14.57 2.16 12.41 8.90 7.75 1.15 6.60 4.60 4.06 0.54 3.52 
62 2 GM Dec Tree 29 19 33 entropy 7 10 a=5% lift 20,0,-1,0 10 33 16.73 14.57 2.16 12.41 8.90 7.75 1.15 6.60 4.60 4.06 0.54 3.52 
63 2 GM Dec Tree 30 19 33 entropy 7 20 a=5% lift 20,0,-1,0 7 37 16.04 14.66 1.38 13.28 8.35 7.69 0.66 7.03 4.41 4.07 0.34 3.73 
64 2 GM Dec Tree 31 19 35 entropy 7 40 a=5% lift itmledratio 
itm_to_led 20,0,-1,0 7 36 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14 
65 2 GM Dec Tree 32 19 35 entropy 7 60 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14 
66 2 GM Dec Tree 33 19 35 entropy 7 80 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14 
67 2 GM Dec Tree 34 19 35 entropy 7 100 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14 
68 2 GM Dec Tree 35 19 35 entropy 7 150 a=5% lift 20,0,-1,0 5 37 14.53 13.08 1.45 11.63 7.75 7.19 0.56 6.63 4.36 4.29 0.07 4.22 
64 2 GM Dec Tree 36 19 35 entropy 6 5 a=5% lift 20,0,-1,0 7 29 15.91 14.95 0.96 13.99 8.29 7.83 0.46 7.37 4.40 4.17 0.23 3.94 
ex=20k 
node s mp 
= 30k 
65 2 GM Dec Tree 37 19 
14, raw 
only 
entropy 6 5 a=5% lift 0 20,0,-1,0 7 16 13.92 11.81 2.11 9.69 7.46 6.54 0.93 5.61 4.24 3.91 0.33 3.57 
5.28 2.15 0.41 
improvement gain in Conservative Lift from new variables (vs. DecTree-d2-m19) 
66 3 GM Dec Tree 38 19 45 entropy 8 5 a=5% lift xval = no 20,0,-5,1 3 39 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.58 
67 3 GM Dec Tree 39 38 45 gini 8 5 a=5% lift xval = no 20,0,-5,1 3 71 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.58 
68 3 GM Dec Tree 40 38 45 propchi 8 5 a=5% lift xval = no 20,0,-5,1 3 42 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.58 
69 3 GM Dec Tree 41 38 45 entropy 20 5 a=5% lift subtr= 20,0,-5,1 33 91 20.00 14.81 5.19 9.61 10.00 7.54 2.46 5.08 5.00 3.90 1.10 2.80 
70 3 GM Dec Tree 42 38 45 entropy 20 100 a=5% lift sub=lrg 20,0,-5,1 25 70 19.09 16.25 2.84 13.42 10.00 8.17 1.83 6.35 5.00 4.19 0.81 3.38 
71 3 GM Dec Tree 43 38 45 entropy 20 200 a=5% lift sub=lrg 20,0,-5,1 23 64 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67 
72 3 GM Dec Tree 44 38 45 entropy 20 400 a=5% lift sub=lrg 20,0,-5,1 21 59 15.87 17.08 1.21 14.67 9.02 8.96 0.06 8.89 4.97 4.69 0.28 4.41 
73 3 GM Dec Tree 45 38 45 entropy 20 800 a=5% lift sub=lrg 20,0,-5,1 16 52 14.35 16.16 1.81 12.53 8.46 8.96 0.50 7.96 4.78 4.79 0.01 4.78 
74 3 GM Dec Tree 46 38 45 entropy 20 1600 a=5% lift sub=lrg 20,0,-5,1 16 47 14.25 16.02 1.78 12.47 8.26 8.59 0.34 7.92 4.58 4.42 0.17 4.25 
75 3 GM Dec Tree 47 38 45 entropy 20 3200 a=5% lift sub=lrg 20,0,-5,1 10 39 12.45 14.35 1.91 10.54 7.49 8.31 0.82 6.67 4.36 4.48 0.12 4.24 
76 3 GM Dec Tree 48 43 45 entropy 20 150 a=5% lift sub=lrg 20,0,-5,1 23 68 18.57 16.25 2.32 13.93 10.00 8.14 1.86 6.27 5.00 4.17 0.83 3.34 
77 3 GM Dec Tree 49 43 45 entropy 20 300 a=5% lift sub=lrg 20,0,-5,1 23 62 16.45 17.86 1.41 15.03 9.31 8.96 0.35 8.61 5.00 4.60 0.40 4.20 
78 3 GM Dec Tree 50 43 45 entropy 20 250 a=5% lift sub=lrg 20,0,-5,1 24 65 16.64 17.71 1.07 15.57 9.56 8.96 0.60 8.36 5.00 4.61 0.39 4.21 
79 3 GM Dec Tree 51 43 45 entropy 20 350 a=5% lift sub=lrg 20,0,-5,1 24 67 16.07 17.50 1.43 14.64 9.19 8.96 0.23 8.73 5.00 4.59 0.41 4.18 
80 3 GM Dec Tree 52 43 45 entropy 20 225 a=5% lift sub=lrg 20,0,-5,1 23 63 17.85 16.67 1.18 15.49 9.83 8.96 0.87 8.09 5.00 4.53 0.48 4.05 
81 3 GM Dec Tree 53 43 45 entropy 20 175 a=5% lift sub=lrg 20,0,-5,1 26 68 18.15 16.25 1.90 14.35 9.97 8.13 1.84 6.28 5.00 4.16 0.84 3.32 
82 3 GM Dec Tree 54 43 45 entropy 20 200 a=5% lift sub=lrg 20,0,-5.0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67 
83 3 GM Dec Tree 55 43 45 entropy 20 200 a=5% lift sub=lrg 20,0,-1,0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67 
84 3 GM Dec Tree 56 43 45 entropy 20 200 a=5% lift sub=lrg 20,-5,0,0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67 
85 4 GM Dec Tree 57 43 146 entropy 20 200 a=5% lift sub=lrg 20,0,-5,1 9 149 20.00 14.09 5.91 8.19 10.00 7.20 2.80 4.40 5.00 3.76 1.24 2.51 
86 4 GM Dec Tree 58 57 107 (tree settings the same, dropped INT* categorical vars, not DBC) 
18 115 20.00 16.09 3.91 12.18 10.00 8.15 1.85 6.29 5.00 4.18 0.82 3.35 
87 4 GM Dec Tree 59 57 107 entropy 20 500 a=5% lift sub=lrg 20,0,-5,1 13 110 19.46 14.79 4.68 10.11 10.00 7.64 2.36 5.29 5.00 3.95 1.05 2.91 
88 4 GM Dec Tree 60 57 107 entropy 20 1000 a=5% lift sub=lrg 20,0,-5,1 10 89 18.94 14.47 4.47 10.00 10.00 7.44 2.56 4.88 5.00 3.86 1.14 2.73 
89 4 GM Dec Tree 61 57 107 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 7 81 14.41 13.91 0.50 13.41 9.54 8.02 1.51 6.51 6.61 4.25 2.36 1.90 
90 4 GM Dec Tree 62 57 107 entropy 20 3000 a=5% lift sub=lrg 20,0,-5,1 5 71 9.89 7.91 1.98 5.94 8.74 6.39 2.35 4.04 5.00 3.70 1.30 2.40 
91 4 GM Dec Tree 63 57 107 entropy 20 1500 a=5% lift sub=lrg 20,0,-5,1 9 60 16.17 14.66 1.50 13.16 9.89 8.18 1.71 6.47 5.00 3.38 1.62 1.76 
92 4 GM Dec Tree 64 57 107 entropy 20 1750 a=5% lift sub=lrg 20,0,-5,1 7 60 15.23 14.32 0.92 13.40 9.68 8.07 1.61 6.46 5.00 4.26 0.75 3.51 
93 4 GM Dec Tree 65 57 107 entropy 20 2250 a=5% lift sub=lrg 20,0,-5,1 5 60 15.43 11.00 4.43 6.56 9.55 6.30 3.25 3.05 5.00 3.70 1.30 2.40 
94 4 GM Dec Tree 66 61 58 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 8 105 14.07 13.92 0.15 13.77 8.45 7.88 0.57 7.30 4.74 4.02 0.73 3.29 
95 4 GM Dec Tree 67 61 80 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 8 97 14.25 13.94 0.30 13.64 9.25 7.88 1.37 6.51 5.00 4.25 0.75 3.49 
96 4 GM Dec Tree 68 61 103 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 7 103 14.41 13.72 0.69 13.03 9.54 8.02 1.52 6.50 5.00 4.25 0.75 3.50 
interactions are getting selected, improve Trn results but 
decrease Val results. Perhaps I should regen the INT*dbc with a 
larger number of min records. 
More 
97 4n GM Dec Tree 69 61 3 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,0 7 14.61 15.54 0.93 13.68 8.83 8.99 0.16 8.67 4.88 4.73 0.15 4.58 
98 4n GM Dec Tree 70 0 20 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,0 10 11.50 11.12 0.38 10.74 7.08 7.29 0.21 6.87 4.24 3.94 0.30 3.64 
use RAW vars ONLY, to test value of my preprocessing 
M 
cnt 
Data 
Ver 
Aut 
hor 
Algor 
Mod 
Num 
chng 
from 
prior 
binary 
model 
cleanup 
model 
max num 
rips 
Var 
Sel 
Trn 
Time 
Train Val Gap 
Consrv 
Result 
Train Val Gap 
Consrv 
Result 
Train Val Gap 
Consrv 
Result 
94 1 GM Rule Ind 1 0 tree neural 16 32 10.77 9.92 0.85 9.07 6.28 5.60 0.68 4.92 3.35 3.09 0.26 2.83 
95 1 GM Rule Ind 2 1 regr neural 16 36 5.95 7.52 1.57 4.38 3.55 4.85 1.30 2.25 2.35 3.17 0.82 1.53 
96 1 GM Rule Ind 3 1 neural tree 16 121 5.95 7.92 1.97 3.98 3.52 5.64 2.12 1.40 2.34 3.31 0.97 1.37 
97 1 GM Rule Ind 4 3 neural tree 4 121 5.95 7.92 1.97 3.98 3.52 5.64 2.12 1.40 2.34 3.31 0.97 1.37 
98 1 GM Rule Ind 5 3 neural tree 32 121 5.95 7.92 1.97 3.98 3.53 5.64 2.11 1.42 2.34 3.32 0.98 1.36 
99 1 GM Rule Ind 6 1 tree neural 32 32 7.25 5.26 1.99 3.27 6.45 5.17 1.28 3.89 3.43 3.09 0.34 2.75 
Can treat model notebook table 
as meta-data (i.e. 144 records or 
models) 
Train models on meta-data 
Source vars = model parameters 
Target 1 = conservative result 
or 
Target 2 = training time 
Perform sensitivity analysis 
to answer questions: 
Q) Searching which model 
training parameters lead to the 
best results? 
Q) …most training time?
Outline 
Model Training Parameters in SAS Enterprise Miner 
Tracking Conservative Results in a “Model Notebook” 
How to Measure Progress 
Meta-Gradient Search of Model Training Parameters 
How to Plan and dynamically adapt 
How to Describe Any Complex System – Sensitivity 
25
Design Of Experiments (DOE) 
Parameter Search 
• Ideally, vary one parameter at a time, quantify the results 
– Bigger challenge in BIG DATA compute per model 
• Exhaustive Grid Search O(3P) 
– for Param A = Low, Med, High (test 3 settings) 
– for Param B = Low, Med, High 
– for Param C = Low, Med, High 
– easy to implement, not the most efficient 
– Can use Fractional Factorial design (i.e. 10%) 
• Scales less effectively for many parameters 
• Stochastic Search (Genetic Algorithms) O(1002) 
C 
– Directed Random Search is more efficient than Grid Search, but… 
– Can be overkill in complexity: (100 models / generation) * (100’s gens) 
• Taguchi Analysis (works with this DOE approach) 
– Efficient multivariate orthogonal search 
– test landing pages w/ Offermatica (acquired by Ominture in 2007 for DOE) 
– http://en.wikipedia.org/wiki/Taguchi_methods 
– Does not use domain knowledge of parameter interactions - OPPORTUNITY 
A 
B
Taguchi 
Design 
• Not a full grid 
search 
• Can we 
improve with 
experience 
and a 
heuristic 
process? 
27 
http://www.itl.nist.gov/div898/handbook/pri/section5/pri56.htm 
http://www.jmp.com/support/downloads/pdf/jmp_design_of_experiments.pdf
Model Parameters 
Algorithm Searches Meta-Search by a Data Miner 
Design of Experiments (DOE) 
Over Your Choices 
Algorithm Model Parameters Model Training Parameters 
Regression weights variable selct (forward, step) 
Neural net weights step size; learning rate 
Decision Tree (spend < $1000) max depth; (Gini, Entropy) 
28
Model Parameters vs. 
Model Training Parameters 
Algorithm Searches Meta-Search by a Data Miner 
Design of Experiments (DOE) 
Over Your Choices 
Algorithm Model Parameters Model Training Parameters 
Regression weights variable select (forward, step) 
Neural net weights step size; learning rate 
Decision Tree (spend < $1000) max depth; (Gini, Entropy) 
29
Heuristic Planning Your 
Design of Experiments (DOE) 
• Assumptions about Data Mining Project 
– May be on BIG DATA, with practical constraints 
– May be training 4 to 400 models (not 4000+ like GA) 
– Want diversity, to investigate different algorithms 
– Want to generalize process to future deployments 
• Heuristic Strategies 
– Use knowledge of interacting parameters (parallel tests) 
• (Cost+profit weights) and (boosting weights) fight each other 
– Delay searching compute intensive parameters 
• First stabilize most other “computationally reasonable” params 
• Large decision tree depth, 
• neural nets w/ lots of connections 
– Opportunistically spend time by algorithm success 30
Gradient Descent Numerical Methods 
Searching to Find Minima 
31 
High Error 
Low Error 
Forest 
Fields 
Beach 
Water 
Deep Water 
Weight Parameter 1 
Weight Param 2 
Min 
Min 
hill tops 
beach 
water 
Min
Gradient Descent Numerical Methods 
Searching to Find Minima 
32 
“Ski Down” from 
the mountains to 
Lake Tahoe 
Moving = adjust param 
X = starting position 
M = a local minimum 
High Error 
Low Error 
Forest 
Fields 
Beach 
Water 
Deep Water 
Weight Parameter 1 
Weight Param 2 
X 
M 
M 
hill tops 
beach 
water
Conservative Result with Respect to 
Model Training Parameters 
33 
“Ski Down” from 
the mountains to 
Lake Tahoe 
Moving = adjust param 
X = starting position 
M = a local minimum 
High Error 
Low Error 
Forest 
Fields 
Beach 
Water 
Deep Water 
Model Parameter 1 
Model Param 2 
X 
M 
M
Heuristic Planning Your 
Design of Experiments (DOE) 
• Start with a reasonable default setting of 
parameters, 
– the “center of the daisy”  the gradient check 
• Vary one parameter at a time from the center 
– “each petal of the daisy”  gradient search trial 
• Move to the next “reasonable multivariate start” 
– The “stem of the daisy”  steepest descent 34
Heuristic “Meta-Gradient Search” of 
Model Training Parameters 
35 
High Error 
Parameter 2 
Low Error Parameter 1 
M
Heuristic “Meta-Gradient Search” of 
Model Training Parameters 
36 
High Error 
Parameter 2 
Low Error Parameter 1 
M
Heuristic “Meta-Gradient Search” of 
Model Training Parameters 
37 
Parameter 1 
Parameter 2 
M 
vs. 
Taguchi DOE 
Art vs. Science? 
No, a practical 
compliment 
using existing 
num. methods
Heuristic “Meta-Gradient Search” of 
Model Training Parameters 
38 
Mod 
Num 
chng 
from 
prior 
vars 
offered 
criterion 
max 
depth 
leaf size 
1 0 27 default 6 5 
2 1 27 probchisq 6 5 
3 1 27 entropy 6 5 
4 1 27 gini 6 5 
5 3 27 entropy 12 5 
6 3 27 entropy 6 10 
7 3 27 entropy 6 100 
8 3 27 entropy 6 100 
9 3 27 entropy 6 5 
10 3 27 entropy 6 5 
11 3 27 entropy 6 5 
12 3 27 entropy 10 2 
Can you give a more 
tangible example? 
This sounds a bit 
vague. 
Change from Prior Model 
– tracks change from the 
“center of a daisy” 
(Model 1 or 3)
Heuristic “Meta-Gradient Search” of 
Model Training Parameters 
• After stabilizing most of the “fast” and “medium” 
compute time parameters, search the “long compute 
time settings” 
• With the final parameter settings, if 2x or 10x more data 
is available, perform a “final bake in,” long training run 
• Then try Ensemble Methods 
– Stacking, boosting, bagging combining many of the best 
models, 
– Gradient Boosting over residual error 
– Select models who’s residual errors correlate the least 
– Use a 2nd stage model to combine 1st stage models and top 
preprocessed fields (for context switching) 
– Last year’s KDD Cup winners 
– Netflix winners used Ensemble methods
Outline 
Model Training Parameters in SAS Enterprise Miner 
Tracking Conservative Results in a “Model Notebook” 
How to Measure Progress 
Meta-Gradient Search of Model Training Parameters 
How to Plan and dynamically adapt 
How to Describe Any Complex System 
Sensitivity Analysis 
40
Needs to Describe Forecast Alg 
• Many Data Mining solutions need description 
– To check writer (to SVP, owner, business unit, …) business reality 
check before deployment 
– “What if” analysis, to fine tune larger system 
• Feed Operations Research or Revenue Management systems 
– Need a modeling “descriptive simulation” (political donations) 
– When evaluating credit, by law required to offer 4 “reason 
codes” for each person scored – when they are declined 
• Should the Data Miner cut algorithm choices? 
– NO! “I understand how a bike works, but I drive a car to work” 
– how much detailed understanding is needed? 
– Provide enough info to “drive the car” vs. “build the car” 
• Check writer does not need to understand B-tree to buy SQL 41
Sensitivity Analysis 
(OAT) One At a Time* 
For source fields with 
binned ranges, sensitivity 
tells you importance of the 
range, i.e. “low”, …. “high” 
Can put sensitivity values in 
Record Level “Reason 
codes” can be extracted 
from the most important 
bins that apply to the given 
42 
Target field 
Arbitrarily Complex 
Data Mining System 
(S) Source fields 
*Some catch interactions 
Pivot Tables 
or Cluster 
record 
Delta in forecast 
Present record N, S times, each input 5% bigger (fixed input delta) 
Record delta change in output, S times per record 
Aggregate: average(abs(delta)), target change per input field delta
43 
Descriptions of Predictive Models 
Reason Codes – Ranked by Sensitivity Analysis 
• Reason codes are specific to the model and record 
• Ranked predictive fields Mr. Smith Mr. Jones 
max_late_payment_120d 0 1 
max_late_payment_90d 1 0 
bankrupt_in_last_5_yrs 1 1 
max_late_payment_60d 0 0 
• Mr. Smith’s reason codes include: 
max_late_payment_90d 1 
bankrupt_in_last_5_yrs 1
Summary 
• Conservative Result (How to Measure) 
– Continuous metric to select accurate and general models 
• Heuristic Meta-Gradient Search (How to Plan) 
– An automated or human process to plan a Design of 
Experiments (DOE) 
– Searches the training parameters that a data miner adjusts 
in data mining software (“meta-parameter search”) 
– Heuristic DOE improvements 
• Most systems can be “reasonably described” 
– Focus on repeatable business benefit (accuracy) over 
description or blind Occam’s Razor on a tech metric 
44 
SF Bay ACM, Data Mining SIG, Feb 28, 2011 
http://www.sfbayacm.org/?p=2464 
Greg_Makowski@yahoo.com 
www.LinkedIn.com/in/GregMakowski 
Take Away: The process of going 
from design objectives to heuristic design

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science Challenge
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learning
 
Explainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretableExplainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretable
 
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
 
presentationIDC - 14MAY2015
presentationIDC - 14MAY2015presentationIDC - 14MAY2015
presentationIDC - 14MAY2015
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural Networks
 
Optimization
OptimizationOptimization
Optimization
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Credit risk meetup
Credit risk meetupCredit risk meetup
Credit risk meetup
 
Setting up Machine Learning Projects - Full Stack Deep Learning
Setting up Machine Learning Projects - Full Stack Deep LearningSetting up Machine Learning Projects - Full Stack Deep Learning
Setting up Machine Learning Projects - Full Stack Deep Learning
 
L11. The Future of Machine Learning
L11. The Future of Machine LearningL11. The Future of Machine Learning
L11. The Future of Machine Learning
 
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคลMachine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
 
Scaling AI in production using PyTorch
Scaling AI in production using PyTorchScaling AI in production using PyTorch
Scaling AI in production using PyTorch
 

Andere mochten auch

Javascript frameworks
Javascript frameworksJavascript frameworks
Javascript frameworks
sigmaray
 
Planning of experiment in industrial research
Planning of experiment in industrial researchPlanning of experiment in industrial research
Planning of experiment in industrial research
pbbharate
 
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Greg Makowski
 

Andere mochten auch (20)

Лекция 6 Планирование эксперимента
Лекция 6 Планирование экспериментаЛекция 6 Планирование эксперимента
Лекция 6 Планирование эксперимента
 
Linked In Slides 2009 02 24 B
Linked In Slides 2009 02 24 BLinked In Slides 2009 02 24 B
Linked In Slides 2009 02 24 B
 
Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09
 
SFbayACM ACM Data Science Camp 2015 10 24
SFbayACM ACM Data Science Camp 2015 10 24SFbayACM ACM Data Science Camp 2015 10 24
SFbayACM ACM Data Science Camp 2015 10 24
 
The 360º Leader (Section 2 of 6)
The 360º Leader (Section 2 of 6)The 360º Leader (Section 2 of 6)
The 360º Leader (Section 2 of 6)
 
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
 
LeanUX: Online Design of Experiments
LeanUX: Online Design of ExperimentsLeanUX: Online Design of Experiments
LeanUX: Online Design of Experiments
 
The 360º Leader (Section 1 of 6)
The 360º Leader (Section 1 of 6)The 360º Leader (Section 1 of 6)
The 360º Leader (Section 1 of 6)
 
Design of experiments
Design of experimentsDesign of experiments
Design of experiments
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
 
Application of Design of Experiments (DOE) using Dr.Taguchi -Orthogonal Array...
Application of Design of Experiments (DOE) using Dr.Taguchi -Orthogonal Array...Application of Design of Experiments (DOE) using Dr.Taguchi -Orthogonal Array...
Application of Design of Experiments (DOE) using Dr.Taguchi -Orthogonal Array...
 
Three case studies deploying cluster analysis
Three case studies deploying cluster analysisThree case studies deploying cluster analysis
Three case studies deploying cluster analysis
 
Javascript frameworks
Javascript frameworksJavascript frameworks
Javascript frameworks
 
Генетические алгоритмы
Генетические алгоритмыГенетические алгоритмы
Генетические алгоритмы
 
Reformulation Strategies Based On Design Of Experiments (DOE) Enhancement Of ...
Reformulation Strategies Based On Design Of Experiments (DOE) Enhancement Of ...Reformulation Strategies Based On Design Of Experiments (DOE) Enhancement Of ...
Reformulation Strategies Based On Design Of Experiments (DOE) Enhancement Of ...
 
Design of Experiments on Federator Polystore Architecture
Design of Experiments on Federator Polystore ArchitectureDesign of Experiments on Federator Polystore Architecture
Design of Experiments on Federator Polystore Architecture
 
IRM Проектировщики
IRM ПроектировщикиIRM Проектировщики
IRM Проектировщики
 
Planning of experiment in industrial research
Planning of experiment in industrial researchPlanning of experiment in industrial research
Planning of experiment in industrial research
 
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
 
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning SolutionsKamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
 

Ähnlich wie Heuristic design of experiments w meta gradient search

NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design Training
ESCOM
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
khairulhuda242
 
Build Deep Learning model to identify santader bank's dissatisfied customers
Build Deep Learning model to identify santader bank's dissatisfied customersBuild Deep Learning model to identify santader bank's dissatisfied customers
Build Deep Learning model to identify santader bank's dissatisfied customers
sriram30691
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
sriram30691
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
Sri Ambati
 
EGUE Technikrom Final_8_12_13
EGUE Technikrom Final_8_12_13EGUE Technikrom Final_8_12_13
EGUE Technikrom Final_8_12_13
Paul Brodbeck
 

Ähnlich wie Heuristic design of experiments w meta gradient search (20)

Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral Research
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design Training
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
 
Build Deep Learning model to identify santader bank's dissatisfied customers
Build Deep Learning model to identify santader bank's dissatisfied customersBuild Deep Learning model to identify santader bank's dissatisfied customers
Build Deep Learning model to identify santader bank's dissatisfied customers
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
 
Thesis presentation: Applications of machine learning in predicting supply risks
Thesis presentation: Applications of machine learning in predicting supply risksThesis presentation: Applications of machine learning in predicting supply risks
Thesis presentation: Applications of machine learning in predicting supply risks
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
 
EGUE Technikrom Final_8_12_13
EGUE Technikrom Final_8_12_13EGUE Technikrom Final_8_12_13
EGUE Technikrom Final_8_12_13
 
ARIMA
ARIMA ARIMA
ARIMA
 
background.pptx
background.pptxbackground.pptx
background.pptx
 
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
6-130914140240-phpapp01.pdf
6-130914140240-phpapp01.pdf6-130914140240-phpapp01.pdf
6-130914140240-phpapp01.pdf
 

Mehr von Greg Makowski

Understanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptxUnderstanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptx
Greg Makowski
 

Mehr von Greg Makowski (6)

Understanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptxUnderstanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptx
 
Future of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxFuture of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptx
 
A Successful Hiring Process for Data Scientists
A Successful Hiring Process for Data ScientistsA Successful Hiring Process for Data Scientists
A Successful Hiring Process for Data Scientists
 
Kdd 2019: Standardizing Data Science to Help Hiring
Kdd 2019:  Standardizing Data Science to Help HiringKdd 2019:  Standardizing Data Science to Help Hiring
Kdd 2019: Standardizing Data Science to Help Hiring
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
 
How to Create 80% of a Big Data Pilot Project
How to Create 80% of a Big Data Pilot ProjectHow to Create 80% of a Big Data Pilot Project
How to Create 80% of a Big Data Pilot Project
 

Kürzlich hochgeladen

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Kürzlich hochgeladen (20)

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 

Heuristic design of experiments w meta gradient search

  • 1. 1 Heuristic Design of Experiments with Meta-Gradient Search of Model Training Parameters SF Bay ACM, Data Mining SIG, Feb 28, 2011 http://www.sfbayacm.org/?p=2464 Greg_Makowski@yahoo.com www.LinkedIn.com/in/GregMakowski
  • 2. Choice is good… But can be overwhelming 2
  • 3. Key Questions Discussed • You (a data miner) have many algorithms or libraries you can use, with many choices… – How to stay organized among all the choices? • Algorithm parameters • Adjustments in Cost vs. Profit (Type I vs. II error bias) • Metric selection (Lift if acting on top % vs. RMSE or ROC) • Ensemble Modeling, boosting, bagging, stacking • Data versions, preprocessing, trying new fields – How to plan, and learn as you go? – How simple should you stay ? – to keep descriptiveness vs. Occam’s Razor? 3
  • 4. Outline Model Training Parameters in SAS Enterprise Miner Tracking Conservative Results in a “Model Notebook” How to Measure Progress Meta-Gradient Search of Model Training Parameters How to Plan and dynamically adapt How to Describe Any Complex System – Sensitivity 4
  • 5. Enterprise Miner Sample Data Flow for a Project : 5 (Boxes are expanded in later slides) Learning Tuning Validation Stratified Sampling
  • 6. Type I vs. II Error Weights Profit-Loss Ratios 6 In the Data Source, NOT the Model Engines In other software, may use a weight field Need to stay organized regardless
  • 7. Regression • It is always good to find the best linear solution early on – Like testing a null hypothesis: (linear vs. non-linear) problem • Can feed “score” or “residual error” as a source field into non-linear models 7
  • 8. Neural Net Architecture and Parameters 8 c c c c c c c c c c c field 1 field 2 $ c $ $ $ $ c c $ c c c RBF c c c $ $ c $ c $ c c c $ $ $$ A Neural Net Solution “Non-Linear” Several regions which are not adjacent MLP
  • 9. A Comparison of a Neural Net and Regression Direct connect 9 A Logistic regression formula: Y = f( a0 + a1*X1 + a2*X2 + a3*X3) a* are coefficients Backpropagation, cast in a similar form: H1 = f(w0 + w1*I1 + w2*I2 + w3*I3) H2 = f(w4 + w5*I1 + w6*I2 + w7*I3) a0 Y X1 X2 X3 : Hn = f(w8 + w9*I1 + w10*I2 + w11*I3) O1 = f(w12 + w13*H1 + .... + w15*Hn) On = .... w* are weights, AKA coefficients I1..In are input nodes or input variables. H1..Hn are hidden nodes, which extract features of the data. O1..On are the outputs, which group disjoint categories. f() is the SIGMOID function, a non-linear “S” curve a1 a2 a3 Output H1 Hidden 2 w1 w2 w3 Input 1 I2 I3 Bias it is very noisy in the brain – chemical depletion of neurotransmitters
  • 10. Neural Net • Network  Architecture can be linear (MLP) or circular (many RBF) • Network  Direct Connection allows inputs to connect to output (to find the simple, linear solution first) • Network  Hidden Units can go up to 64 (much better than 8) • Profit/Loss uses settings in Data Source 10
  • 11. Tree Depth = 2 11 What does a DecisionTree Look Like? Split 3 Age Income $ Split 1 Split 2 $ $$ $ Leaf 3 $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ c c c c c c c c c c c $ c Leaf 4 Leaf 1 Leaf 2 Split 2 Split 3 Leaf 1 Split 1 Leaf 2 Leaf 3 Leaf 4 If (Age < Split1) then :…If (Income > Split2) then Leaf1 with dollar_avg1 :…If (Income < Split2) then Leaf2 with dollar_avg2 If (Age > Split1) then :…If (Income > Split3) then Leaf3 with dollar_avg3 :…If (Income < Split3) then Leaf4 with dollar_avg4
  • 12. Decision Tree • Primary Parameters to vary – Criterion • Probchisq (Default) • Entropy • Gini – Assessment (Decision vs. Lift) – Tree size (depth, leaf size, Xvalid) 12
  • 13. Gradient Boosting (Tree Based) Based on “Greedy Function Approximation: A Gradient Boosting Machine” by Jerome Friedman Each new CART tree: • is on a 60% random sample • Is a small, general tree • Forecasts the error from the forecast from all previous trees summed • May have 50 to 2,000 trees in a sequence • Evaluate how far “back” in sequence to prune 13
  • 14. DM Algorithms Available in Packages 14 # Modules per Forecasting Family in DM Software Regres - s ion Las s o Reg Decis ion Tree Neural Net Support Vector Mach Other TOT 2 1 0 0 0 1 4 0 0 1 0 0 0 1 3 0 3 3 0 3 12 1 0 1 0 1 1 4 0 0 4 0 0 0 4 3 2 5 3 2 3 18 0 0 0 0 0 5 5
  • 15. Feel Overwhelmed on Lots of Complex Algorithm Parameters? GOOD! • A deep understanding of algorithms, math and assumptions helps significantly  Heuristics – i.e. typically, regression has a problem with correlating inputs because the solution calculation uses matrix inversion (if you are worried about weight sign inversion) – SVM’s or Bayesian Nets do not have this problem, because they are solved differently. • Don’t have a problem with correlating inputs, input selection becomes more random – but you still get a decent solution • How can you manage the details? – I am glad you asked…. Moving on to the next section 15
  • 16. Outline Model Training Parameters in SAS Enterprise Miner Tracking Conservative Results in a “Model Notebook” How to Measure Progress Meta-Gradient Search of Model Training Parameters How to Plan and dynamically adapt How to Describe Any Complex System – Sensitivity 16
  • 17. Model Exploration Process • Scientific Method of Hypothesis  Test – If you change ONE thing, than any change in the results is because of that one change – Design of Experiments (DOE), test plan – Best to compare model settings on same data version • New data versions add new preprocessed fields, or new months (records) – Key design objective: all experiments are reproducible • SAME Random split between Learning – Test – Validation, with a consistent random seed – LTV split before loading data in a tool, so same partitioning for all tools/libraries/algorithms
  • 18. Model Notebook Input Parameters Outcomes Lift in Top 10% Train Val 18 Gap = Abs( Trn-Val) Consrv Result Param 1 vars offerd Param 2 var selct Param 3 … Vars Seltd Trn Time Data Ver Algor Mod Num 1 Regrsn 1 27 stepw 9 12 5.77 5.94 0.17 5.60 vars offerd Hidn Nodes Direct Conn Arch Bad vs. Good 1 Neural 1 27 3 n MLP all 77 6.65 10.89 4.24 2.41 1 Neural 2 27 10 n MLP all 40 6.88 6.73 0.15 6.58 1 Neural 3 27 10 Y MLP all 36 6.40 6.93 0.53 5.87 1 Neural 4 27 10 n RBF all 34 5.67 5.54 0.13 5.41 1 Neural 5 27 10 Y RBF all 35 5.95 7.92 1.97 3.98
  • 19. Model Notebook Outcome Details • My Heuristic Design Objectives: (yours may be different) – Accuracy in deployment – Reliability and consistent behavior, a general solution • Use one or more hold-out data sets to check consistency • Penalize more, as the forecast becomes less consistent – No penalty for model complexity (if it validates consistently) • Let me drive a car to work, instead limiting me to a bike – Message for check writer – Don’t consider only Occam’s Razor: value consistent good results – Develop a “smooth, continuous metric” to sort and find models that perform “best” in future deployment 19
  • 20. Model Notebook Outcome Details • Training = results on the training set • Validation = results on the validation hold out • Gap = abs( Training – Validation ) A bigger gap (volatility) is a bigger concern for deployment, a symptom Minimize Senior VP Heart attacks! (one penalty for volatility) Set expectations & meet expectations Regularization helps significantly • Conservative Result = worst( Training, Validation) + Gap_penalty Corr / Lift / Profit  higher is better: Cons Result = min(Trn, Val) - Gap MAD / RMSE / Risk  lower is better: Cons Result = max(Trn, Val) + Gap Business Value or Pain ranking = function of( conservative result2 0)
  • 21. Model Notebook Input Parameters Outcomes Lift in Top 10% Train Val 21 Gap = Abs( Trn-Val) Consrv Result Param 1 vars offerd Param 2 var selct Param 3 … Vars Seltd Trn Time Data Ver Algor Mod Num 1 Regrsn 1 27 stepw 9 12 5.77 5.94 0.17 5.60 vars offerd Hidn Nodes Direct Conn Arch Bad vs. Good 1 Neural 1 27 3 n MLP all 77 6.65 10.89 4.24 2.41 1 Neural 2 27 10 n MLP all 40 6.88 6.73 0.15 6.58 1 Neural 3 27 10 Y MLP all 36 6.40 6.93 0.53 5.87 1 Neural 4 27 10 n RBF all 34 5.67 5.54 0.13 5.41 1 Neural 5 27 10 Y RBF all 35 5.95 7.92 1.97 3.98
  • 22. Model Notebook Process Tracking Detail  Training the Data Miner Data Ver Aut hor Input / Test Outcome Algor Mod Num chng from prior Model Notebook Project = Transit, Last Update 5/6/2010 Input Parameters Outcomes Param 1 Param 2 Param 3 Param 4 Param 5 Param 6 Param 7 Status Lift in Top 10% Over File Avg Var Sel Trn time (sec) Lift in Top 5% Over File Avg Top 5% Train Val Gap = Abs( Trn-Val) Consrv Result Outcomes Top 10% Train Val Gap = Abs( Trn-Val) Consrv Result Outcomes Lift in Top 20% Over File Avg Top 20% Train Val Gap = Abs( Trn-Val) Consrv Result Data Ver Aut hor Algor Mod Num chng from prior vars offered var selectn Var Sel Trn Time Train Val Gap Consrv Result Train Val Gap Consrv Result Train Val Gap Consrv Result 1 GM B logistic 1 0 27 stepws 10 12.04 8.12 3.92 4.20 7.59 4.85 2.74 2.11 1 GM B logistic 2 1 19 stepws 10 12.04 8.12 3.92 4.20 7.59 4.85 2.74 2.11 1 GM B logistic 3 1 6, no dbc stepws 4 7.51 1.98 5.53 -3.55 4.90 3.96 0.94 3.02 investigate inconsistency 1 GM B logistic 4 1 13, only dbc stepws 7 9.58 7.33 2.25 5.08 6.59 5.25 1.34 3.91 Data Ver Aut hor Algor Mod Num chng from prior vars offered regr type var selectn 2-factor interact polynom Var Sel Trn Time Train Val Gap Consrv Result Train Val Gap Consrv Result Train Val Gap Consrv Result Regression 1 GM regr 1 0 27 logistic stepws n 9 12 5.77 5.94 0.17 5.60 3.35 4.46 1.11 2.24 2.25 3.02 0.77 1.48 1 GM regr 2 1 27 logistic stepws Yes 9 16 5.76 5.94 0.18 5.58 3.35 4.46 1.11 2.24 2.25 3.02 0.77 1.48 1 GM regr 3 1 27 logistic stepws n 2 10 57 5.86 6.93 1.07 4.79 3.48 5.03 1.55 1.93 2.32 2.61 0.29 2.03 1 GM regr 4 1 27 logistic stepws Yes 2 11 58 5.86 6.93 1.07 4.79 3.48 5.04 1.56 1.92 2.32 2.92 0.60 1.72 4 GM regr 5 4 3 logistic stepwise Yes 2 8 63 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.43 4 GM regr 6 5 28 logistic stepwise Yes 2 didn't finish, out of memory 4 GM regr 7 5 3 logistic stepwise n 2 63 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.43 4 GM regr 8 5 3 logistic stepwise n 1 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.43 4 GM regr 9 5 3 logistic stepwise Yes 1 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.43 4 GM regr 10 8 28 logistic stepwise n 1 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.43 4 GM regr 11 5 3 logistic stepwise Yes 3 6 78 15.98 16.06 0.08 15.89 8.61 8.03 0.58 7.45 4.81 4.39 0.41 3.98 4 GM regr 12 5 3 logistic stepwise Yes 4 2 78 15.98 16.06 0.08 15.89 8.61 8.03 0.58 7.45 4.81 4.39 0.41 3.98 add Feb & Mar to 4n GM regr 13 11 3 logistic stepwise Yes 3 6 78 18.39 18.79 0.39 18.00 9.58 9.55 0.03 9.52 4.96 4.92 0.03 4.89 recent* recent_serrtrn_dbc changed to recent_serrtrn_flag 4n GM regr 14 11 3 6 78 12.49 12.12 0.36 11.76 7.63 7.42 0.20 7.22 4.29 4.47 0.18 4.12 (does DBC on ser patt help? YES) Yippeee! 1 GM DM Regr 1 0 27 logistic stepws 13 15 12.00 3.17 8.83 -5.66 7.21 4.16 3.05 1.11 4.28 3.07 1.21 1.86 4 GM DM Regr 2 0 28 max v 3000 min rsq 0.005 use aov16 var YES 6 72 16.27 15.76 0.52 15.24 8.67 8.03 0.64 7.39 4.58 4.24 0.34 3.90 1 GM PLS 1 0 1 GM PLS 2 1 27 default default default default 4 18 11.26 3.08 8.18 -5.10 7.12 4.85 2.27 2.58 4.28 3.12 1.16 1.96 1 GM PLS 3 1 Test Set Cros Val didn't finish, don't use Xvalidation 4 GM PLS 4 0 28 PLS NIPALS 200 28 122 16.63 15.76 0.87 14.89 8.93 8.03 0.90 7.13 4.76 4.32 0.45 3.87 Data Ver Aut hor Algor Mod Num chng from prior vars offered hidden Direct Conn ? arch Var Sel Trn Time Train Val Gap Consrv Result Train Val Gap Consrv Result Train Val Gap Consrv Result 1 GM AutoNrl 1 0 27 2 n MLP all 35 4.19 3.76 0.43 3.33 2.47 2.57 0.10 2.37 1.77 1.88 0.11 1.66 1 GM AutoNrl 2 1 27 6 n MLP all 189 4.37 2.77 1.60 1.17 2.82 1.78 1.04 0.74 1.98 1.93 0.05 1.88 1 GM AutoNrl 3 1 27 8 n MLP AutoNeural trn action = search all 532 0.83 0.56 0.27 0.29 0.83 0.56 0.27 0.29 0.83 0.56 0.27 0.29 1 GM AutoNrl 4 1 27 8 n MLP activ = logistic all 356 5.12 2.97 2.15 0.82 3.02 3.37 0.35 2.67 1.90 2.57 0.67 1.23 1 GM AutoNrl 5 1 27 6 n MLP arch = block all 130 0.89 0.97 0.08 0.81 1 GM AutoNrl 6 1 27 6 n MLP arch = funnel all 595 1.36 1.08 0.28 0.80 4 GM AutoNrl 7 1 28 6 n MLP all 1201 16.2722 15.76 0.51 15.24 8.65 7.88 0.77 7.11 4.46 4.24 0.22 4.03 Data Ver Aut hor Algor Mod Num chng from prior vars offered hidden Direct Conn ? arch Decay Decision Weight Var Sel Trn Time Train Val Gap Consrv Result Train Val Gap Consrv Result Train Val Gap Consrv Result 1 GM Neural 1 0 27 3 n MLP all 77 6.65 10.89 4.24 2.41 3.90 6.53 2.63 1.27 2.52 3.96 1.44 1.08 1 GM Neural 2 1 27 10 n MLP all 40 6.88 6.73 0.15 6.58 3.97 4.55 0.58 3.39 2.56 3.02 0.46 2.10 1 GM Neural 3 1 27 10 Y MLP all 36 6.40 6.93 0.53 5.87 3.49 5.45 1.96 1.53 2.32 3.22 0.90 1.42 1 GM Neural 4 1 27 10 n RBF (orbfeq) all 34 5.67 5.54 0.13 5.41 3.25 4.85 1.60 1.65 2.20 3.22 1.02 1.18 1 GM Neural 5 1 27 10 Y RBF all 35 5.95 7.92 1.97 3.98 3.48 4.85 1.37 2.11 2.31 3.17 0.86 1.45 js1 JS Neural 6 0 17 5 n MLP Softmax 10,-5,-1,0 all 6.03 6.53 0.50 5.53 3.40 4.55 1.15 2.25 2.67 3.36 0.69 1.98 js1 JS Neural 7 6 15 5 Y MLP Softmax 10,-5,-1,0 all 6.14 5.74 0.40 5.34 3.59 2.97 0.62 2.35 2.77 2.37 0.40 1.97 js1 JS Neural 8 6 15 3 Y MLP Softmax 0.5 10,-5,-1,0 all 6.27 7.13 0.86 5.41 3.54 3.56 0.02 3.52 2.74 2.57 0.17 2.40 js1 JS Neural 9 6 15 3 n MLP Softmax 0.5 10,-5,-1,0 all 6.27 6.33 0.06 6.21 3.57 4.65 1.08 2.49 2.76 2.82 0.06 2.70 2 GM Neural 10 2 35 12 Y MLP 20,0,-1,0 all 3 GM Neural 11 2 45 20 n MLP 20,0,-1,0 all 18 6.26 7.76 1.50 4.76 3.54 4.22 0.68 2.86 2.18 2.46 0.28 1.91 3 GM Neural 12 11 45 20 n MLP 0.8 20,0,-1,0 all 16 6.26 7.76 1.50 4.76 3.54 4.22 0.68 2.86 2.18 2.46 0.28 1.91 3 GM Neural 13 11 45 20 n MLP 0.6 20,0,-1,0 all 16 6.26 7.76 1.50 4.76 3.54 4.22 0.68 2.86 2.18 2.46 0.28 1.91 4 GM Neural 14 11 3 20 n MLP 0.01 20,0,-1,0 all 204 16.39 15.15 1.24 13.91 8.67 8.03 0.64 7.39 4.82 4.39 0.43 3.97 4 GM Neural 15 11 28 20 n MLP 0.01 20,0,-1,0 all 713 16.39 15.76 0.63 15.12 8.54 7.88 0.66 7.22 4.40 4.25 0.15 4.11 4 GM Neural 16 15 31 40 n MLP 0.01 20,0,-1,0 all 782 18.02 18.18 0.16 17.86 9.21 9.55 0.34 8.87 4.60 4.77 0.17 4.44 4 GM Neural 17 15 same, max iter 20 --> 50 all 1754 18.02 18.18 0.16 17.86 9.21 9.55 0.34 8.87 4.66 4.77 0.11 4.55 4 GM Neural 18 16 29 (no twoYr) same, max iter 20 --> 50 Neural 40 0 0 all 18.386 18.98 18.18 0.80 17.38 9.25 9.59 0.34 8.90 4.67 4.86 0.20 4.47 4n GM DMNeural 19 0 13 3 n all 19 10.60 2.57 8.03 -5.46 6.93 4.36 2.57 1.79 4.14 2.57 1.57 1.00 More Heuristic Strategy: 1) Try a few models of many algorithm types (seed the search) 2) Opportunistically spend more effort on what is working (invest in top stocks) 3) Still try a few trials on medium success (diversify, limited by project time-box) 4) Try ensemble methods, combining model forecasts & top source vars w/ The Data Mining Battle Field model
  • 23. Model Notebook Process Tracking Detail  Training the Data Miner M cnt Data Ver Aut hor Algor Mod Num chng from prior vars offered criterion max depth leaf size asses = 5% Lift Decision Weight Var Sel Trn Time Train Val Gap Consrv Result Train Val Gap Consrv Result Train Val Gap Consrv Result 47 1 GM Dec Tree 1 0 27 default 6 5 20,0,-5,0 7 13 13.71 9.59 4.12 5.47 7.67 5.35 2.32 3.03 4.33 3.80 0.53 3.27 48 1 GM Dec Tree 2 1 27 probchisq 6 5 20,0,-5,0 7 16 13.71 9.59 4.12 5.47 7.67 5.35 2.32 3.03 4.33 3.80 0.53 3.27 49 1 GM Dec Tree 3 1 27 entropy 6 5 20,0,-5,0 6 16 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 50 1 GM Dec Tree 4 1 27 gini 6 5 20,0,-5,0 10 22 13.76 11.28 2.48 8.80 7.70 6.10 1.60 4.50 4.32 3.71 0.61 3.10 51 1 GM Dec Tree 5 3 27 entropy 12 5 20,0,-5,0 6 13 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 52 1 GM Dec Tree 6 3 27 entropy 6 10 20,0,-5,0 6 13 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 53 1 GM Dec Tree 7 3 27 entropy 6 100 20,0,-5,0 6 17 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 54 1 GM Dec Tree 8 3 27 entropy 6 100 xval = Y 20,0,-5,0 8 32 14.51 12.82 1.69 11.13 8.95 7.42 1.53 5.89 4.72 4.13 0.59 3.54 55 1 GM Dec Tree 9 3 27 entropy 6 5 xval = Y 20,0,-5,0 8 32 14.51 12.82 1.69 11.13 8.95 7.42 1.53 5.89 4.72 4.13 0.59 3.54 56 1 GM Dec Tree 10 3 27 entropy 6 5 obs import = Y DecisionTree Data Version 1 20,0,-5,0 6 17 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 57 1 GM Dec Tree 11 3 27 entropy 6 5 asses = 5% Lift 20,0,-5,0 6 12 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 58 1 GM Dec Tree 12 3 27 entropy 10 2 20,0,-5,0 6 12 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 46 2 GM Dec Tree 13 3 33 entropy 6 5 a=5% lift 20,0,-5,0 7 16 15.92 14.96 0.96 14.00 8.29 7.84 0.45 7.39 4.40 4.17 0.23 3.94 47 2 GM Dec Tree 14 13 33 entropy 6 5 a=5% lift 10,-2.5,-1,0 13 15 16.32 15.05 1.27 13.78 9.07 8.00 1.07 6.93 4.63 4.08 0.55 3.53 48 2 GM Dec Tree 15 13 33 entropy 6 5 a=5% lift 1,-1,1,-1 8 15 15.30 14.34 0.96 13.38 7.98 7.53 0.45 7.08 4.25 4.05 0.20 3.85 49 2 GM Dec Tree 16 13 33 entropy 6 5 a=5% lift 10,-1,1,-1 12 16 16.32 15.05 1.27 13.78 8.96 8.14 0.82 7.32 4.62 4.23 0.39 3.84 50 2 GM Dec Tree 17 13 33 entropy 6 5 a=5% lift 20,-5,0,0 12 15 16.32 15.60 0.72 14.88 8.79 8.26 0.53 7.73 4.47 4.21 0.26 3.95 51 2 GM Dec Tree 18 13 33 entropy 6 5 a=5% lift 20,-1,0,0 12 15 16.32 15.60 0.72 14.88 8.79 8.26 0.53 7.73 4.47 4.21 0.26 3.95 52 2 GM Dec Tree 19 13 33 entropy 6 5 a=5% lift xval = no DecisionTree 20,0,-1,0 6 15 15.87 15.52 0.35 15.17 8.26 8.12 0.14 7.98 4.40 4.32 0.08 4.24 53 2 GM Dec Tree 20 13 33 entropy 6 5 a=5% lift 20,-5,-1,1 12 16 16.32 15.05 1.27 13.78 8.96 8.14 0.82 7.32 4.62 4.23 0.39 3.84 54 2 GM Dec Tree 21 13 33 entropy 6 5 a=5% lift xval = no 20,0,0,1 9 16 16.17 15.57 0.60 14.97 8.74 8.25 0.49 7.76 4.44 4.21 0.23 3.98 55 2 GM Dec Tree 22 19 33 gini 6 5 a=5% lift 20,0,-1,0 8 16 15.17 13.17 2.00 56 2 GM Dec Tree 23 19 33 probchisq 6 5 a=5% lift 20,0,-1,0 8 16 15.17 13.17 2.00 57 2 GM Dec Tree 24 19 33 entropy 20 5 a=5% lift Data 20,0,-1,0 19 Version 26 18.94 15.42 3.52 2 11.17 8.02 7.32 0.70 6.62 4.40 4.26 0.14 4.12 11.17 8.02 7.32 0.70 6.62 4.40 4.26 0.14 4.12 11.90 9.67 7.78 1.89 5.89 4.90 4.06 0.84 3.22 58 2 GM Dec Tree 25 19 33 entropy 20 20 a=5% lift 20,0,-1,0 19 26 18.94 13.80 5.14 8.66 9.67 7.78 1.89 5.89 4.90 4.06 0.84 3.22 59 2 GM Dec Tree 26 19 33 entropy 20 40 a=5% lift 20,0,-1,0 7 27 16.06 15.29 0.77 14.52 8.36 8.00 0.36 7.64 4.41 4.23 0.18 4.05 60 2 GM Dec Tree 27 19 33 entropy 20 60 a=5% lift 20,0,-1,0 7 27 16.06 15.29 0.77 14.52 8.36 8.00 0.36 7.64 4.41 4.23 0.18 4.05 61 2 GM Dec Tree 28 19 33 entropy 7 5 a=5% lift 20,0,-1,0 10 33 16.73 14.57 2.16 12.41 8.90 7.75 1.15 6.60 4.60 4.06 0.54 3.52 62 2 GM Dec Tree 29 19 33 entropy 7 10 a=5% lift 20,0,-1,0 10 33 16.73 14.57 2.16 12.41 8.90 7.75 1.15 6.60 4.60 4.06 0.54 3.52 63 2 GM Dec Tree 30 19 33 entropy 7 20 a=5% lift 20,0,-1,0 7 37 16.04 14.66 1.38 13.28 8.35 7.69 0.66 7.03 4.41 4.07 0.34 3.73 64 2 GM Dec Tree 31 19 35 entropy 7 40 a=5% lift itmledratio itm_to_led 20,0,-1,0 7 36 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14 65 2 GM Dec Tree 32 19 35 entropy 7 60 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14 66 2 GM Dec Tree 33 19 35 entropy 7 80 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14 67 2 GM Dec Tree 34 19 35 entropy 7 100 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14 68 2 GM Dec Tree 35 19 35 entropy 7 150 a=5% lift 20,0,-1,0 5 37 14.53 13.08 1.45 11.63 7.75 7.19 0.56 6.63 4.36 4.29 0.07 4.22 64 2 GM Dec Tree 36 19 35 entropy 6 5 a=5% lift 20,0,-1,0 7 29 15.91 14.95 0.96 13.99 8.29 7.83 0.46 7.37 4.40 4.17 0.23 3.94 ex=20k node s mp = 30k 65 2 GM Dec Tree 37 19 14, raw only entropy 6 5 a=5% lift 0 20,0,-1,0 7 16 13.92 11.81 2.11 9.69 7.46 6.54 0.93 5.61 4.24 3.91 0.33 3.57 5.28 2.15 0.41 improvement gain in Conservative Lift from new variables (vs. DecTree-d2-m19) 66 3 GM Dec Tree 38 19 45 entropy 8 5 a=5% lift xval = no 20,0,-5,1 3 39 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.58 67 3 GM Dec Tree 39 38 45 gini 8 5 a=5% lift xval = no 20,0,-5,1 3 71 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.58 68 3 GM Dec Tree 40 38 45 propchi 8 5 a=5% lift xval DecisionTree = no 20,0,-5,1 3 42 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.58 69 3 GM Dec Tree 41 38 45 entropy 20 5 a=5% lift subtr= 20,0,-5,1 33 91 20.00 14.81 5.19 9.61 10.00 7.54 2.46 5.08 5.00 3.90 1.10 2.80 70 3 GM Dec Tree 42 38 45 entropy 20 100 a=5% lift sub=lrg 20,0,-5,1 25 70 19.09 16.25 2.84 13.42 10.00 8.17 1.83 6.35 5.00 4.19 0.81 3.38 71 3 GM Dec Tree 43 38 45 entropy 20 200 a=5% lift sub=lrg 20,0,-5,1 23 64 17.67 16.67 1.01 72 3 GM Dec Tree 44 38 45 entropy 20 400 a=5% lift sub=lrg 20,0,-5,1 21 59 15.87 17.08 1.21 73 3 GM Dec Tree 45 38 45 entropy 20 800 a=5% lift Data sub=lrg 20,0,-5,1 Version 16 52 14.35 16.16 1.81 3 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67 14.67 9.02 8.96 0.06 8.89 4.97 4.69 0.28 4.41 12.53 8.46 8.96 0.50 7.96 4.78 4.79 0.01 4.78 74 3 GM Dec Tree 46 38 45 entropy 20 1600 a=5% lift sub=lrg 20,0,-5,1 16 47 14.25 16.02 1.78 12.47 8.26 8.59 0.34 7.92 4.58 4.42 0.17 4.25 75 3 GM Dec Tree 47 38 45 entropy 20 3200 a=5% lift sub=lrg 20,0,-5,1 10 39 12.45 14.35 1.91 10.54 7.49 8.31 0.82 6.67 4.36 4.48 0.12 4.24 76 3 GM Dec Tree 48 43 45 entropy 20 150 a=5% lift sub=lrg 20,0,-5,1 23 68 18.57 16.25 2.32 13.93 10.00 8.14 1.86 6.27 5.00 4.17 0.83 3.34 77 3 GM Dec Tree 49 43 45 entropy 20 300 a=5% lift sub=lrg 20,0,-5,1 23 62 16.45 17.86 1.41 15.03 9.31 8.96 0.35 8.61 5.00 4.60 0.40 4.20 78 3 GM Dec Tree 50 43 45 entropy 20 250 a=5% lift sub=lrg 20,0,-5,1 24 65 16.64 17.71 1.07 15.57 9.56 8.96 0.60 8.36 5.00 4.61 0.39 4.21 79 3 GM Dec Tree 51 43 45 entropy 20 350 a=5% lift sub=lrg 20,0,-5,1 24 67 16.07 17.50 1.43 14.64 9.19 8.96 0.23 8.73 5.00 4.59 0.41 4.18 80 3 GM Dec Tree 52 43 45 entropy 20 225 a=5% lift sub=lrg 20,0,-5,1 23 63 17.85 16.67 1.18 15.49 9.83 8.96 0.87 8.09 5.00 4.53 0.48 4.05 81 3 GM Dec Tree 53 43 45 entropy 20 175 a=5% lift sub=lrg 20,0,-5,1 26 68 18.15 16.25 1.90 14.35 9.97 8.13 1.84 6.28 5.00 4.16 0.84 3.32 82 3 GM Dec Tree 54 43 45 entropy 20 200 a=5% lift sub=lrg 20,0,-5.0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67 83 3 GM Dec Tree 55 43 45 entropy 20 200 a=5% lift sub=lrg 20,0,-1,0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67 84 3 GM Dec Tree 56 43 45 entropy 20 200 a=5% lift sub=lrg 20,-5,0,0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67 85 4 GM Dec Tree 57 43 146 entropy 20 200 a=5% lift sub=lrg 20,0,-5,1 9 149 20.00 14.09 5.91 8.19 10.00 7.20 2.80 4.40 5.00 3.76 1.24 2.51 86 4 GM Dec Tree 58 57 107 (tree settings the same, dropped INT* categorical vars, not DBC) 18 115 20.00 16.09 3.91 12.18 10.00 8.15 1.85 6.29 5.00 4.18 0.82 3.35 87 4 GM Dec Tree 59 57 107 entropy 20 500 a=5% lift sub=DecisionTree lrg 20,0,-5,1 13 110 19.46 14.79 4.68 10.11 10.00 7.64 2.36 5.29 5.00 3.95 1.05 2.91 88 4 GM Dec Tree 60 57 107 entropy 20 1000 a=5% lift sub=lrg 20,0,-5,1 10 89 18.94 14.47 4.47 10.00 10.00 7.44 2.56 4.88 5.00 3.86 1.14 2.73 89 4 GM Dec Tree 61 57 107 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 7 81 14.41 13.91 0.50 13.41 9.54 8.02 1.51 6.51 6.61 4.25 2.36 1.90 90 4 GM Dec Tree 62 57 107 entropy 20 3000 a=5% lift sub=lrg 20,0,-5,1 5 71 9.89 7.91 1.98 5.94 8.74 6.39 2.35 4.04 5.00 3.70 1.30 2.40 91 4 GM Dec Tree 63 57 107 entropy 20 1500 a=5% lift Data sub=lrg 20,0,-5,1 Version 9 60 16.17 14.66 1.50 4 13.16 9.89 8.18 1.71 6.47 5.00 3.38 1.62 1.76 92 4 GM Dec Tree 64 57 107 entropy 20 1750 a=5% lift sub=lrg 20,0,-5,1 7 60 15.23 14.32 0.92 13.40 9.68 8.07 1.61 6.46 5.00 4.26 0.75 3.51 93 4 GM Dec Tree 65 57 107 entropy 20 2250 a=5% lift sub=lrg 20,0,-5,1 5 60 15.43 11.00 4.43 6.56 9.55 6.30 3.25 3.05 5.00 3.70 1.30 2.40 94 4 GM Dec Tree 66 61 58 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 8 105 14.07 13.92 0.15 13.77 8.45 7.88 0.57 7.30 4.74 4.02 0.73 3.29 95 4 GM Dec Tree 67 61 80 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 8 97 14.25 13.94 0.30 13.64 9.25 7.88 1.37 6.51 5.00 4.25 0.75 3.49 96 4 GM Dec Tree 68 61 103 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 7 103 14.41 13.72 0.69 13.03 9.54 8.02 1.52 6.50 5.00 4.25 0.75 3.50 interactions are getting selected, improve Trn results but decrease Val results. Perhaps I should regen the INT*dbc with a larger number of min records. More 97 4n GM Dec Tree 69 61 3 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,0 7 14.61 15.54 0.93 13.68 8.83 8.99 0.16 8.67 4.88 4.73 0.15 4.58 98 4n GM Dec Tree 70 0 20 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,0 10 11.50 11.12 0.38 10.74 7.08 7.29 0.21 6.87 4.24 3.94 0.30 3.64 use RAW vars ONLY, to test value of my preprocessing M cnt Data Ver Aut hor Algor Mod Num chng from prior binary model cleanup model max num rips Var Sel Trn Time Train Val Gap Consrv Result Train Val Gap Consrv Result Train Val Gap Consrv Result 94 1 GM Rule Ind 1 0 tree neural 16 32 10.77 9.92 0.85 9.07 6.28 5.60 0.68 4.92 3.35 3.09 0.26 2.83 95 1 GM Rule Ind 2 1 regr neural 16 36 5.95 7.52 1.57 4.38 3.55 4.85 1.30 2.25 2.35 3.17 0.82 1.53 96 1 GM Rule Ind 3 1 neural tree 16 121 5.95 7.92 1.97 3.98 3.52 5.64 2.12 1.40 2.34 3.31 0.97 1.37 97 1 GM Rule Ind 4 3 neural tree 4 121 5.95 7.92 1.97 3.98 3.52 5.64 2.12 1.40 2.34 3.31 0.97 1.37 98 1 GM Rule Ind 5 3 neural tree 32 121 5.95 7.92 1.97 3.98 3.53 5.64 2.11 1.42 2.34 3.32 0.98 1.36 99 1 GM Rule Ind 6 1 tree neural 32 32 7.25 5.26 1.99 3.27 6.45 5.17 1.28 3.89 3.43 3.09 0.34 2.75 “Agile Software Design” Get something simple, fully working and tested early on (Data Version 1) Data Version 2…4 Working, incremental improvements Incremental complexity Different preprocessing Add more fields, records Add & test more complexity
  • 24. Model Notebook Process Tracking Detail  Training the Data Miner M cnt Data Ver Aut hor Algor Mod Num chng from prior vars offered criterion max depth leaf size asses = 5% Lift Decision Weight Var Sel Trn Time Train Val Gap Consrv Result Train Val Gap Consrv Result Train Val Gap Consrv Result 47 1 GM Dec Tree 1 0 27 default 6 5 20,0,-5,0 7 13 13.71 9.59 4.12 5.47 7.67 5.35 2.32 3.03 4.33 3.80 0.53 3.27 48 1 GM Dec Tree 2 1 27 probchisq 6 5 20,0,-5,0 7 16 13.71 9.59 4.12 5.47 7.67 5.35 2.32 3.03 4.33 3.80 0.53 3.27 49 1 GM Dec Tree 3 1 27 entropy 6 5 20,0,-5,0 6 16 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 50 1 GM Dec Tree 4 1 27 gini 6 5 20,0,-5,0 10 22 13.76 11.28 2.48 8.80 7.70 6.10 1.60 4.50 4.32 3.71 0.61 3.10 51 1 GM Dec Tree 5 3 27 entropy 12 5 20,0,-5,0 6 13 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 52 1 GM Dec Tree 6 3 27 entropy 6 10 20,0,-5,0 6 13 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 53 1 GM Dec Tree 7 3 27 entropy 6 100 20,0,-5,0 6 17 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 54 1 GM Dec Tree 8 3 27 entropy 6 100 xval = Y 20,0,-5,0 8 32 14.51 12.82 1.69 11.13 8.95 7.42 1.53 5.89 4.72 4.13 0.59 3.54 55 1 GM Dec Tree 9 3 27 entropy 6 5 xval = Y 20,0,-5,0 8 32 14.51 12.82 1.69 11.13 8.95 7.42 1.53 5.89 4.72 4.13 0.59 3.54 56 1 GM Dec Tree 10 3 27 entropy 6 5 obs import = Y 20,0,-5,0 6 17 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 57 1 GM Dec Tree 11 3 27 entropy 6 5 asses = 5% Lift 20,0,-5,0 6 12 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 58 1 GM Dec Tree 12 3 27 entropy 10 2 20,0,-5,0 6 12 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91 46 2 GM Dec Tree 13 3 33 entropy 6 5 a=5% lift 20,0,-5,0 7 16 15.92 14.96 0.96 14.00 8.29 7.84 0.45 7.39 4.40 4.17 0.23 3.94 47 2 GM Dec Tree 14 13 33 entropy 6 5 a=5% lift 10,-2.5,-1,0 13 15 16.32 15.05 1.27 13.78 9.07 8.00 1.07 6.93 4.63 4.08 0.55 3.53 48 2 GM Dec Tree 15 13 33 entropy 6 5 a=5% lift 1,-1,1,-1 8 15 15.30 14.34 0.96 13.38 7.98 7.53 0.45 7.08 4.25 4.05 0.20 3.85 49 2 GM Dec Tree 16 13 33 entropy 6 5 a=5% lift 10,-1,1,-1 12 16 16.32 15.05 1.27 13.78 8.96 8.14 0.82 7.32 4.62 4.23 0.39 3.84 50 2 GM Dec Tree 17 13 33 entropy 6 5 a=5% lift 20,-5,0,0 12 15 16.32 15.60 0.72 14.88 8.79 8.26 0.53 7.73 4.47 4.21 0.26 3.95 51 2 GM Dec Tree 18 13 33 entropy 6 5 a=5% lift 20,-1,0,0 12 15 16.32 15.60 0.72 14.88 8.79 8.26 0.53 7.73 4.47 4.21 0.26 3.95 52 2 GM Dec Tree 19 13 33 entropy 6 5 a=5% lift xval = no 20,0,-1,0 6 15 15.87 15.52 0.35 15.17 8.26 8.12 0.14 7.98 4.40 4.32 0.08 4.24 53 2 GM Dec Tree 20 13 33 entropy 6 5 a=5% lift 20,-5,-1,1 12 16 16.32 15.05 1.27 13.78 8.96 8.14 0.82 7.32 4.62 4.23 0.39 3.84 54 2 GM Dec Tree 21 13 33 entropy 6 5 a=5% lift xval = no 20,0,0,1 9 16 16.17 15.57 0.60 14.97 8.74 8.25 0.49 7.76 4.44 4.21 0.23 3.98 55 2 GM Dec Tree 22 19 33 gini 6 5 a=5% lift 20,0,-1,0 8 16 15.17 13.17 2.00 11.17 8.02 7.32 0.70 6.62 4.40 4.26 0.14 4.12 56 2 GM Dec Tree 23 19 33 probchisq 6 5 a=5% lift 20,0,-1,0 8 16 15.17 13.17 2.00 11.17 8.02 7.32 0.70 6.62 4.40 4.26 0.14 4.12 57 2 GM Dec Tree 24 19 33 entropy 20 5 a=5% lift 20,0,-1,0 19 26 18.94 15.42 3.52 11.90 9.67 7.78 1.89 5.89 4.90 4.06 0.84 3.22 58 2 GM Dec Tree 25 19 33 entropy 20 20 a=5% lift 20,0,-1,0 19 26 18.94 13.80 5.14 8.66 9.67 7.78 1.89 5.89 4.90 4.06 0.84 3.22 59 2 GM Dec Tree 26 19 33 entropy 20 40 a=5% lift 20,0,-1,0 7 27 16.06 15.29 0.77 14.52 8.36 8.00 0.36 7.64 4.41 4.23 0.18 4.05 60 2 GM Dec Tree 27 19 33 entropy 20 60 a=5% lift 20,0,-1,0 7 27 16.06 15.29 0.77 14.52 8.36 8.00 0.36 7.64 4.41 4.23 0.18 4.05 61 2 GM Dec Tree 28 19 33 entropy 7 5 a=5% lift 20,0,-1,0 10 33 16.73 14.57 2.16 12.41 8.90 7.75 1.15 6.60 4.60 4.06 0.54 3.52 62 2 GM Dec Tree 29 19 33 entropy 7 10 a=5% lift 20,0,-1,0 10 33 16.73 14.57 2.16 12.41 8.90 7.75 1.15 6.60 4.60 4.06 0.54 3.52 63 2 GM Dec Tree 30 19 33 entropy 7 20 a=5% lift 20,0,-1,0 7 37 16.04 14.66 1.38 13.28 8.35 7.69 0.66 7.03 4.41 4.07 0.34 3.73 64 2 GM Dec Tree 31 19 35 entropy 7 40 a=5% lift itmledratio itm_to_led 20,0,-1,0 7 36 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14 65 2 GM Dec Tree 32 19 35 entropy 7 60 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14 66 2 GM Dec Tree 33 19 35 entropy 7 80 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14 67 2 GM Dec Tree 34 19 35 entropy 7 100 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14 68 2 GM Dec Tree 35 19 35 entropy 7 150 a=5% lift 20,0,-1,0 5 37 14.53 13.08 1.45 11.63 7.75 7.19 0.56 6.63 4.36 4.29 0.07 4.22 64 2 GM Dec Tree 36 19 35 entropy 6 5 a=5% lift 20,0,-1,0 7 29 15.91 14.95 0.96 13.99 8.29 7.83 0.46 7.37 4.40 4.17 0.23 3.94 ex=20k node s mp = 30k 65 2 GM Dec Tree 37 19 14, raw only entropy 6 5 a=5% lift 0 20,0,-1,0 7 16 13.92 11.81 2.11 9.69 7.46 6.54 0.93 5.61 4.24 3.91 0.33 3.57 5.28 2.15 0.41 improvement gain in Conservative Lift from new variables (vs. DecTree-d2-m19) 66 3 GM Dec Tree 38 19 45 entropy 8 5 a=5% lift xval = no 20,0,-5,1 3 39 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.58 67 3 GM Dec Tree 39 38 45 gini 8 5 a=5% lift xval = no 20,0,-5,1 3 71 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.58 68 3 GM Dec Tree 40 38 45 propchi 8 5 a=5% lift xval = no 20,0,-5,1 3 42 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.58 69 3 GM Dec Tree 41 38 45 entropy 20 5 a=5% lift subtr= 20,0,-5,1 33 91 20.00 14.81 5.19 9.61 10.00 7.54 2.46 5.08 5.00 3.90 1.10 2.80 70 3 GM Dec Tree 42 38 45 entropy 20 100 a=5% lift sub=lrg 20,0,-5,1 25 70 19.09 16.25 2.84 13.42 10.00 8.17 1.83 6.35 5.00 4.19 0.81 3.38 71 3 GM Dec Tree 43 38 45 entropy 20 200 a=5% lift sub=lrg 20,0,-5,1 23 64 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67 72 3 GM Dec Tree 44 38 45 entropy 20 400 a=5% lift sub=lrg 20,0,-5,1 21 59 15.87 17.08 1.21 14.67 9.02 8.96 0.06 8.89 4.97 4.69 0.28 4.41 73 3 GM Dec Tree 45 38 45 entropy 20 800 a=5% lift sub=lrg 20,0,-5,1 16 52 14.35 16.16 1.81 12.53 8.46 8.96 0.50 7.96 4.78 4.79 0.01 4.78 74 3 GM Dec Tree 46 38 45 entropy 20 1600 a=5% lift sub=lrg 20,0,-5,1 16 47 14.25 16.02 1.78 12.47 8.26 8.59 0.34 7.92 4.58 4.42 0.17 4.25 75 3 GM Dec Tree 47 38 45 entropy 20 3200 a=5% lift sub=lrg 20,0,-5,1 10 39 12.45 14.35 1.91 10.54 7.49 8.31 0.82 6.67 4.36 4.48 0.12 4.24 76 3 GM Dec Tree 48 43 45 entropy 20 150 a=5% lift sub=lrg 20,0,-5,1 23 68 18.57 16.25 2.32 13.93 10.00 8.14 1.86 6.27 5.00 4.17 0.83 3.34 77 3 GM Dec Tree 49 43 45 entropy 20 300 a=5% lift sub=lrg 20,0,-5,1 23 62 16.45 17.86 1.41 15.03 9.31 8.96 0.35 8.61 5.00 4.60 0.40 4.20 78 3 GM Dec Tree 50 43 45 entropy 20 250 a=5% lift sub=lrg 20,0,-5,1 24 65 16.64 17.71 1.07 15.57 9.56 8.96 0.60 8.36 5.00 4.61 0.39 4.21 79 3 GM Dec Tree 51 43 45 entropy 20 350 a=5% lift sub=lrg 20,0,-5,1 24 67 16.07 17.50 1.43 14.64 9.19 8.96 0.23 8.73 5.00 4.59 0.41 4.18 80 3 GM Dec Tree 52 43 45 entropy 20 225 a=5% lift sub=lrg 20,0,-5,1 23 63 17.85 16.67 1.18 15.49 9.83 8.96 0.87 8.09 5.00 4.53 0.48 4.05 81 3 GM Dec Tree 53 43 45 entropy 20 175 a=5% lift sub=lrg 20,0,-5,1 26 68 18.15 16.25 1.90 14.35 9.97 8.13 1.84 6.28 5.00 4.16 0.84 3.32 82 3 GM Dec Tree 54 43 45 entropy 20 200 a=5% lift sub=lrg 20,0,-5.0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67 83 3 GM Dec Tree 55 43 45 entropy 20 200 a=5% lift sub=lrg 20,0,-1,0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67 84 3 GM Dec Tree 56 43 45 entropy 20 200 a=5% lift sub=lrg 20,-5,0,0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67 85 4 GM Dec Tree 57 43 146 entropy 20 200 a=5% lift sub=lrg 20,0,-5,1 9 149 20.00 14.09 5.91 8.19 10.00 7.20 2.80 4.40 5.00 3.76 1.24 2.51 86 4 GM Dec Tree 58 57 107 (tree settings the same, dropped INT* categorical vars, not DBC) 18 115 20.00 16.09 3.91 12.18 10.00 8.15 1.85 6.29 5.00 4.18 0.82 3.35 87 4 GM Dec Tree 59 57 107 entropy 20 500 a=5% lift sub=lrg 20,0,-5,1 13 110 19.46 14.79 4.68 10.11 10.00 7.64 2.36 5.29 5.00 3.95 1.05 2.91 88 4 GM Dec Tree 60 57 107 entropy 20 1000 a=5% lift sub=lrg 20,0,-5,1 10 89 18.94 14.47 4.47 10.00 10.00 7.44 2.56 4.88 5.00 3.86 1.14 2.73 89 4 GM Dec Tree 61 57 107 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 7 81 14.41 13.91 0.50 13.41 9.54 8.02 1.51 6.51 6.61 4.25 2.36 1.90 90 4 GM Dec Tree 62 57 107 entropy 20 3000 a=5% lift sub=lrg 20,0,-5,1 5 71 9.89 7.91 1.98 5.94 8.74 6.39 2.35 4.04 5.00 3.70 1.30 2.40 91 4 GM Dec Tree 63 57 107 entropy 20 1500 a=5% lift sub=lrg 20,0,-5,1 9 60 16.17 14.66 1.50 13.16 9.89 8.18 1.71 6.47 5.00 3.38 1.62 1.76 92 4 GM Dec Tree 64 57 107 entropy 20 1750 a=5% lift sub=lrg 20,0,-5,1 7 60 15.23 14.32 0.92 13.40 9.68 8.07 1.61 6.46 5.00 4.26 0.75 3.51 93 4 GM Dec Tree 65 57 107 entropy 20 2250 a=5% lift sub=lrg 20,0,-5,1 5 60 15.43 11.00 4.43 6.56 9.55 6.30 3.25 3.05 5.00 3.70 1.30 2.40 94 4 GM Dec Tree 66 61 58 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 8 105 14.07 13.92 0.15 13.77 8.45 7.88 0.57 7.30 4.74 4.02 0.73 3.29 95 4 GM Dec Tree 67 61 80 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 8 97 14.25 13.94 0.30 13.64 9.25 7.88 1.37 6.51 5.00 4.25 0.75 3.49 96 4 GM Dec Tree 68 61 103 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 7 103 14.41 13.72 0.69 13.03 9.54 8.02 1.52 6.50 5.00 4.25 0.75 3.50 interactions are getting selected, improve Trn results but decrease Val results. Perhaps I should regen the INT*dbc with a larger number of min records. More 97 4n GM Dec Tree 69 61 3 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,0 7 14.61 15.54 0.93 13.68 8.83 8.99 0.16 8.67 4.88 4.73 0.15 4.58 98 4n GM Dec Tree 70 0 20 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,0 10 11.50 11.12 0.38 10.74 7.08 7.29 0.21 6.87 4.24 3.94 0.30 3.64 use RAW vars ONLY, to test value of my preprocessing M cnt Data Ver Aut hor Algor Mod Num chng from prior binary model cleanup model max num rips Var Sel Trn Time Train Val Gap Consrv Result Train Val Gap Consrv Result Train Val Gap Consrv Result 94 1 GM Rule Ind 1 0 tree neural 16 32 10.77 9.92 0.85 9.07 6.28 5.60 0.68 4.92 3.35 3.09 0.26 2.83 95 1 GM Rule Ind 2 1 regr neural 16 36 5.95 7.52 1.57 4.38 3.55 4.85 1.30 2.25 2.35 3.17 0.82 1.53 96 1 GM Rule Ind 3 1 neural tree 16 121 5.95 7.92 1.97 3.98 3.52 5.64 2.12 1.40 2.34 3.31 0.97 1.37 97 1 GM Rule Ind 4 3 neural tree 4 121 5.95 7.92 1.97 3.98 3.52 5.64 2.12 1.40 2.34 3.31 0.97 1.37 98 1 GM Rule Ind 5 3 neural tree 32 121 5.95 7.92 1.97 3.98 3.53 5.64 2.11 1.42 2.34 3.32 0.98 1.36 99 1 GM Rule Ind 6 1 tree neural 32 32 7.25 5.26 1.99 3.27 6.45 5.17 1.28 3.89 3.43 3.09 0.34 2.75 Can treat model notebook table as meta-data (i.e. 144 records or models) Train models on meta-data Source vars = model parameters Target 1 = conservative result or Target 2 = training time Perform sensitivity analysis to answer questions: Q) Searching which model training parameters lead to the best results? Q) …most training time?
  • 25. Outline Model Training Parameters in SAS Enterprise Miner Tracking Conservative Results in a “Model Notebook” How to Measure Progress Meta-Gradient Search of Model Training Parameters How to Plan and dynamically adapt How to Describe Any Complex System – Sensitivity 25
  • 26. Design Of Experiments (DOE) Parameter Search • Ideally, vary one parameter at a time, quantify the results – Bigger challenge in BIG DATA compute per model • Exhaustive Grid Search O(3P) – for Param A = Low, Med, High (test 3 settings) – for Param B = Low, Med, High – for Param C = Low, Med, High – easy to implement, not the most efficient – Can use Fractional Factorial design (i.e. 10%) • Scales less effectively for many parameters • Stochastic Search (Genetic Algorithms) O(1002) C – Directed Random Search is more efficient than Grid Search, but… – Can be overkill in complexity: (100 models / generation) * (100’s gens) • Taguchi Analysis (works with this DOE approach) – Efficient multivariate orthogonal search – test landing pages w/ Offermatica (acquired by Ominture in 2007 for DOE) – http://en.wikipedia.org/wiki/Taguchi_methods – Does not use domain knowledge of parameter interactions - OPPORTUNITY A B
  • 27. Taguchi Design • Not a full grid search • Can we improve with experience and a heuristic process? 27 http://www.itl.nist.gov/div898/handbook/pri/section5/pri56.htm http://www.jmp.com/support/downloads/pdf/jmp_design_of_experiments.pdf
  • 28. Model Parameters Algorithm Searches Meta-Search by a Data Miner Design of Experiments (DOE) Over Your Choices Algorithm Model Parameters Model Training Parameters Regression weights variable selct (forward, step) Neural net weights step size; learning rate Decision Tree (spend < $1000) max depth; (Gini, Entropy) 28
  • 29. Model Parameters vs. Model Training Parameters Algorithm Searches Meta-Search by a Data Miner Design of Experiments (DOE) Over Your Choices Algorithm Model Parameters Model Training Parameters Regression weights variable select (forward, step) Neural net weights step size; learning rate Decision Tree (spend < $1000) max depth; (Gini, Entropy) 29
  • 30. Heuristic Planning Your Design of Experiments (DOE) • Assumptions about Data Mining Project – May be on BIG DATA, with practical constraints – May be training 4 to 400 models (not 4000+ like GA) – Want diversity, to investigate different algorithms – Want to generalize process to future deployments • Heuristic Strategies – Use knowledge of interacting parameters (parallel tests) • (Cost+profit weights) and (boosting weights) fight each other – Delay searching compute intensive parameters • First stabilize most other “computationally reasonable” params • Large decision tree depth, • neural nets w/ lots of connections – Opportunistically spend time by algorithm success 30
  • 31. Gradient Descent Numerical Methods Searching to Find Minima 31 High Error Low Error Forest Fields Beach Water Deep Water Weight Parameter 1 Weight Param 2 Min Min hill tops beach water Min
  • 32. Gradient Descent Numerical Methods Searching to Find Minima 32 “Ski Down” from the mountains to Lake Tahoe Moving = adjust param X = starting position M = a local minimum High Error Low Error Forest Fields Beach Water Deep Water Weight Parameter 1 Weight Param 2 X M M hill tops beach water
  • 33. Conservative Result with Respect to Model Training Parameters 33 “Ski Down” from the mountains to Lake Tahoe Moving = adjust param X = starting position M = a local minimum High Error Low Error Forest Fields Beach Water Deep Water Model Parameter 1 Model Param 2 X M M
  • 34. Heuristic Planning Your Design of Experiments (DOE) • Start with a reasonable default setting of parameters, – the “center of the daisy”  the gradient check • Vary one parameter at a time from the center – “each petal of the daisy”  gradient search trial • Move to the next “reasonable multivariate start” – The “stem of the daisy”  steepest descent 34
  • 35. Heuristic “Meta-Gradient Search” of Model Training Parameters 35 High Error Parameter 2 Low Error Parameter 1 M
  • 36. Heuristic “Meta-Gradient Search” of Model Training Parameters 36 High Error Parameter 2 Low Error Parameter 1 M
  • 37. Heuristic “Meta-Gradient Search” of Model Training Parameters 37 Parameter 1 Parameter 2 M vs. Taguchi DOE Art vs. Science? No, a practical compliment using existing num. methods
  • 38. Heuristic “Meta-Gradient Search” of Model Training Parameters 38 Mod Num chng from prior vars offered criterion max depth leaf size 1 0 27 default 6 5 2 1 27 probchisq 6 5 3 1 27 entropy 6 5 4 1 27 gini 6 5 5 3 27 entropy 12 5 6 3 27 entropy 6 10 7 3 27 entropy 6 100 8 3 27 entropy 6 100 9 3 27 entropy 6 5 10 3 27 entropy 6 5 11 3 27 entropy 6 5 12 3 27 entropy 10 2 Can you give a more tangible example? This sounds a bit vague. Change from Prior Model – tracks change from the “center of a daisy” (Model 1 or 3)
  • 39. Heuristic “Meta-Gradient Search” of Model Training Parameters • After stabilizing most of the “fast” and “medium” compute time parameters, search the “long compute time settings” • With the final parameter settings, if 2x or 10x more data is available, perform a “final bake in,” long training run • Then try Ensemble Methods – Stacking, boosting, bagging combining many of the best models, – Gradient Boosting over residual error – Select models who’s residual errors correlate the least – Use a 2nd stage model to combine 1st stage models and top preprocessed fields (for context switching) – Last year’s KDD Cup winners – Netflix winners used Ensemble methods
  • 40. Outline Model Training Parameters in SAS Enterprise Miner Tracking Conservative Results in a “Model Notebook” How to Measure Progress Meta-Gradient Search of Model Training Parameters How to Plan and dynamically adapt How to Describe Any Complex System Sensitivity Analysis 40
  • 41. Needs to Describe Forecast Alg • Many Data Mining solutions need description – To check writer (to SVP, owner, business unit, …) business reality check before deployment – “What if” analysis, to fine tune larger system • Feed Operations Research or Revenue Management systems – Need a modeling “descriptive simulation” (political donations) – When evaluating credit, by law required to offer 4 “reason codes” for each person scored – when they are declined • Should the Data Miner cut algorithm choices? – NO! “I understand how a bike works, but I drive a car to work” – how much detailed understanding is needed? – Provide enough info to “drive the car” vs. “build the car” • Check writer does not need to understand B-tree to buy SQL 41
  • 42. Sensitivity Analysis (OAT) One At a Time* For source fields with binned ranges, sensitivity tells you importance of the range, i.e. “low”, …. “high” Can put sensitivity values in Record Level “Reason codes” can be extracted from the most important bins that apply to the given 42 Target field Arbitrarily Complex Data Mining System (S) Source fields *Some catch interactions Pivot Tables or Cluster record Delta in forecast Present record N, S times, each input 5% bigger (fixed input delta) Record delta change in output, S times per record Aggregate: average(abs(delta)), target change per input field delta
  • 43. 43 Descriptions of Predictive Models Reason Codes – Ranked by Sensitivity Analysis • Reason codes are specific to the model and record • Ranked predictive fields Mr. Smith Mr. Jones max_late_payment_120d 0 1 max_late_payment_90d 1 0 bankrupt_in_last_5_yrs 1 1 max_late_payment_60d 0 0 • Mr. Smith’s reason codes include: max_late_payment_90d 1 bankrupt_in_last_5_yrs 1
  • 44. Summary • Conservative Result (How to Measure) – Continuous metric to select accurate and general models • Heuristic Meta-Gradient Search (How to Plan) – An automated or human process to plan a Design of Experiments (DOE) – Searches the training parameters that a data miner adjusts in data mining software (“meta-parameter search”) – Heuristic DOE improvements • Most systems can be “reasonably described” – Focus on repeatable business benefit (accuracy) over description or blind Occam’s Razor on a tech metric 44 SF Bay ACM, Data Mining SIG, Feb 28, 2011 http://www.sfbayacm.org/?p=2464 Greg_Makowski@yahoo.com www.LinkedIn.com/in/GregMakowski Take Away: The process of going from design objectives to heuristic design