This document discusses methods for interpreting complex predictive models, specifically focusing on determining variable importance. It introduces input shuffling as a technique to assess variable importance for any model, including linear regression, decision trees, neural networks, and ensembles. Input shuffling randomly shuffles values of one input variable at a time and measures the impact on model predictions to determine how influential each variable is to the model. The document demonstrates input shuffling on both regression and classification tasks.
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
1015 track2 abbott
1. When Model Interpretation Matters:
Understanding Complex Predictive Models
Dean Abbott
Co-Founder and Chief Data Scientist, SmarterHQ
Twitter: @deanabb
11. Other Ways to Compute Neural Network Sensitivities
Such as… http://www.palisade.com/downloads/pdf/academic/DTSpaper110915.pdf
And ftp://ftp.sas.com/pub/neural/importance.html#mlp_parder_interp
• Weight tracing – sum of product of weights (and variants)
• Partial derivatives – avg, avg absolute, squared, etc.
• Remove variable, compute change in accuracy
12. Naïve Bayes Model Outputs
Essentially a series of
cross-tabs for every
variable!
Remember, the final
probability is the
product of the
individual variable
probabilities.
14. What About Model Ensembles?
Decision Logic
Ensemble Prediction
10s to 100s of trees…
15.
16. Outline
• Classical variable importance: linear regression
• Hack #1: using linear regression model statistics to
infer variable importance
17. The Data: Easiest Possible!
• 3 inputs: each is a random Normal: mean = 20, std = 5
• Target variable: 0.5*var1 + 0.2*var2 + 0.3*var3
• 95,412 records (same size as cup98lrn)
18. Linear Regression Coefficient
For Each Variable to Assess Influence
• Coefficient match (be definition) the proportions used to
be build the target variable
• This is the average influence of each input on the
predictions for all records
19. Assess Influence with t-proportion
For Each Variable
• I know I’m breaking rules here. Bear with me….
20. Assess Influence with t-proportion
For Each Variable
• T-value measures the significance of the relationship.
• It turns out, that the proportion of the t-values for the exact model
matches the coefficients
21. Assess Influence using Direct Measure of
Influence Proportion
• Compute the contribution of each term in the linear regression model
separately (each record).
– Var1_influence = $var1coef$ * $var1$, etc.
• Compute the proportion of the contribution of the predicted
target variable value
• Average the contributions of each variable for each record to compute the
average influence of each variable
22. So Far So Good
• Now let’s do the
same thing for
– Neural Networks
– Support Vector
Machines.
23. So Far So Good
• Now let’s do the
same thing for
– Neural Networks
– Support Vector
Machines.
24. Motivation for Input Shuffling
http://www.elderresearch.com/company/target-shuffling
25. Why “Input Shuffling”?
• We don’t always have nice metrics
to assess inputs of predictive
models -- Neural Networks, SVM,
ensembles
– Contrast with statistical methods like
Regression
• Even with regression, we don’t
always have the right input
distributions so these metrics are
good indicators of variable
influence
27. What does “Shuffled” mean?
• Scramble (randomly) a single input
variable
– Input Shuffling Node doesn’t have to be
in a loop; it can scramble a column while
leaving the others in their natural order
• Captures the actual distribution of
the data
28. Principles of Input Shuffling
• Key: randomly re-populate values of a single input variable while leaving
all other variables with their original values
• Compute the standard deviation (or some other measure of perturbation)
for each record
– Of the Predicted Target Variable – posterior probability
– NOT the actual target variable value
• This perturbation is a measure of how influential the variable is in the
model
– High standard deviation -> lots of influence
– Low standard deviation -> not much influence
– ~0 standard deviation -> no influence
29. Shuffled Inputs Meta Node
Two Loops: (1) loop on input variables and (2) shuffle input variable (50x or so)
30. The Input Shuffling Process
1. Build the predictive model
2. For a data subset (can use training, or some suitably sized set), N records
3. Loop over every variable
1. Loop M times (50 by default)
1. Shuffle the variable (keeping all other inputs for that row fixed)
2. Score the Model
3. Save the scores for the entire data set (you will end up with times the #
records)
2. Compute the standard deviation of the predictions for each row (or some other
measure of “spread”), i.e., group by Row ID, computing stdev. Now we have N records
again
3. Compute the average spread of an input over all N records, such as the mean of these
standard deviations, i.e., group by entire data set. Now we have 1 number, the
variable influence
4. Compare all results. Sort descending by variable influence.
32. Average for All Records in data
• Measures the spread of the predictions when randomly perturbing the single
input variable
33. Input Shuffling Result:
Idealized Linear Regression Data
• Compute proportion of the average standard deviation from shuffling the
input (keeping others with the original values)
• (yes, I know I’m averaging standard deviations!)
Target variable: 0.5*var1 + 0.2*var2 + 0.3*var3
34. Realistic Data: KDD Cup 1998
• 95,412: cup98lrn from KDD Cup 1998 Competition
– Use only the responders (4843) in linear regression models
• Hundreds of fields in data, but only use 4 for our purposes here
– LASTGIFT, NGIFTALL,
RFA_2F, D_RFA_2A
• Continuous target
• Two continuous inputs
• One ordinal input (RFA_2F)
• One dummy input (D_RFA_2A)
35. Realistic Data: KDD Cup 1998
• Heavy skew of LASTGIFT, NGIFTALL, TARGET_D
– Makes visualization difficult
– Biases
regression
coefficients
(if
one cares)
– So, do the usual
“best practices”
36. Normalized Data
• To remove influence of skew and scale
– Log10 transform LASTGIFT, NGIFTALL, TARGET_D
– Scale all variables (post log10) to [0, 1]
37. Normalized Data
• Relationships clearer
– LASTGIFT strong positive correlation with TARGET_D
– NGIFTALL, RFA_2F, D_RFA_2A all have apparently slight negative
correlation
with
TARGET_D
38. The Basic Model: Linear Regression
Coefficient
Use abs() for influence calculations
39. Linear Regression:
Compare Influence Using Different Methods
Coefficient t-Proportion
Use abs() for t-proportion calculationsUse abs() for influence calculations
40. Linear Regression:
Compare Influence Using Different Methods
Coefficient t-Proportion
Direct Proportion Input Shuffling Proportion
Use abs() for t-proportion calculationsUse abs() for calculations
Use abs() for t-proportion calculations
41. Linear Regression, Neural Network: Input Shuffling
Influence
Input Shuffling- LR Input Shuffling - MLP
42. Applying Input Shuffling to Classification: Logistic Regression
Start simple: just 4 variables (like the regression example
43. Applying Input Shuffling to Classification: Logistic Regression
Influence Based on Proportion of z-score Influence Based on Input Shuffling
46. Conclusion
• Input shuffling can generate model sensitivity scores for
any model, no matter how complex or nonlinear
• Input shuffling can be applied to any algorithm, no
matter how linear or nonlinear the algorithm is
• Matches linear regression variable influence (t-value)
• Similar to logistic regression variable influence (z-
score)
47. Future Work
• If model predictions (scores) are not normally distributed, and if the influence
is not uniform, average overall influence doesn’t tell the full story (or may even
tell a misleading story) about how valuable the variable is in predicting the
target
– Break predictions into bins (deciles or other number of bins) allows us to compute
an influence score for every part of the predicted range
– Answers the question: for high predicted values, which variables are most
influential
• Build score influence rather than prediction influence
– Use ROC AUC statistics for each shuffled input, and determine the influence of each variable
on the model score rather than the predicted value