Dan Mallinger, Data Science Practice Manager, Think Big Analytics at MLconf NYC

Think Big, Start Smart, Scale Fast
Analytics Communication:
Re-Introducing Complex Models

2
• Director of Data Science at Think Big
• I work in the intersection of statistics and technology
• But also business and analytics
• Too often see data scientists limit themselves and their
businesses
Dan Mallinger

3
1. Importance of Communication
2. Lost Tools of Analytics Communication
3. Tricks for those in Regulated Environments
4. More Communication
Today

5
• Familiar = Clear
• Clear = Explainable
• Explainable = Understood
• Understood = Trustworthy
“Explainable” Model Fallacy

6
Better Communication Yields…

7
Bad Communication and Black Boxes…

8
Why We Should Care:
We Won’t Waste Money
Alas, not even a 250Gb
server was sufficient:
even after waiting
three days, the data
couldn't even be
loaded. […]
Steve said it would be
difficult for managers to
accept a process that
involved sampling.

9
hlm.html('Test1', test1_score__eoy~test1_score__boy + ...
is_special_ed * perc_free_lunch ...
other_score * support_rec ...
(is_focal | inst_sid), data=kinder)
Technically this is a regression…
So simple anyone can understand it!
Why We Should Care:
You Can’t Explain Your Models Anyway

10
• If your model need to be re-fit every month, it probably has an
eating disorder
• Be a better communicator to yourself
Why We Should Care:
Some of Us Don’t Understand Our Models

12
• Predicting “Membership” (Not really, this is dummy outcome)
• Pick a “black box” model
• Build understanding
Airline Data

13
Danger! Does Your Manager Know What Strata Are?!
Manager Doesn’t Trust Samples?

14
• Easy:
sapply(1:5000, function(i) {
rand.rows <- sample.int(nrow(raw),
size=10000)
df <- raw[rand.rows, c(dep.cols, ind.cols)]
m <- nnet(Member~., data=df, size=10)
})
• Easier:
library(bootstrap)
• Bootstrap!
– Simple, but underused
– Resample data, rebuild models
– Parametric and non-parametric
bootstrapping (bias/variance)
Gist of non-parametric: Do it a
bunch of times, treat results as
distribution for CI
Manager Doesn’t Trust Samples?

16
• Bob has convinced his
manager that his
sampling strategy is
acceptable (Good Job,
Bob!)
• But he hasn’t built trust in
the model
Now What?

17
Bob Doesn’t Explain Variables Like This…

18
• If X matters, then shuffling it should hurt our model
• Then bootstrap for confidence intervals
• Most R models have a method for this (see caret)
Shining a light into the parameters of our black box
Variable Importance

19
Shining a light into the parameters of our black box
Variable Importance: Bob’s Data

20
• Similar to variable importance
• How do relationships in our model play out in different settings?
• How much does our model depend on accurate measurement?
Sensitivity and Robustness

21
Sensitivity and Robustness Example
My code wasn’t working, so thanks to:
https://beckmw.wordpress.com/2013/10/07/sensitivity-analysis-for-neural-networks/

22
More Sensitivity and Robustness
Manual variable permutation in R
library(sensitivity)

23
• Bob’s manager has told him that
black box models are not
allowed
• But Bob’s neural net performed
better than anything else. Oh
dear!
Dang!

24
• Bob’s work in neural nets can be
leveraged!
• Generically: Prototype selection
• Identify points on the decision
boundary to improve model
• Specifically: Extracting decision
trees from neural nets
Blackbox to Whitebox

25
Blackbox to Whitebox: Methodology
“Extracting Decision Trees from Trained Neural Networks” - Krishnan & Bhattacharya
Also: https://github.com/dvro/scikit-protopy

26
• Bob has shown how
variables impact his
black box
• He’s shown how they
behave in different
contexts
• He’s show how robust
they are to errors
• But he hasn’t told us why
we should care
Now What?

27
Accuracy, False Positive Rates, Confusions matrices are CONSTRUCTS
Metrics and Assessment

28
• Enterprises are slow: Predict KPI not KRI
• Give confidence bands, sensitivities, and impact of context changes
• Build a story about the model internals and assumptions; tie to domain
knowledge of audience
• Explainability is up to the modeler, not the model *
• Unless, of course, your regulator says otherwise!
Conclusions

29
We’re hiring!
http://thinkbig.teradata.com
Thanks!

Dan Mallinger, Data Science Practice Manager, Think Big Analytics at MLconf NYC

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Dan Mallinger, Data Science Practice Manager, Think Big Analytics at MLconf NYC

Ähnlich wie Dan Mallinger, Data Science Practice Manager, Think Big Analytics at MLconf NYC (20)

Mehr von MLconf

Mehr von MLconf (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Dan Mallinger, Data Science Practice Manager, Think Big Analytics at MLconf NYC