SlideShare a Scribd company logo
1 of 33
Download to read offline
Show me your
predictive model
Why and how?
Przemysław Biecek
University of Warsaw
Warsaw University of Technology
Do we really need plots to understand numbers?
https://www.autodeskresearch.com/publications/samestats
Package datasauRus
Do we really need plots to understand numbers?
https://www.autodeskresearch.com/publications/samestats
Package datasauRus
Do we really need plots to understand numbers?
Do we need plots to understand models?
Is it all about accuracy?
1) …there is a lot of opportunity to do visualization for machine
learning. Even many of the people working in the field don’t
have good intuitions for how their systems work, and they
need tools to inspect what they’re doing, debug, etc…
https://eagereyes.org/blog/2017/eurovis-2017-conference-report-part-1
Robert Kosara
2) Understanding and trust - we need to understand models
that makes important decisions.
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin (2016)
3) Models have assumptions, and we need early warnings
that something is wrong.
Do we need plots to understand models?
1) …there is a lot of opportunity to do visualization for machine
learning. Even many of the people working in the field don’t
have good intuitions for how their systems work, and they
need tools to inspect what they’re doing, debug, etc…
https://eagereyes.org/blog/2017/eurovis-2017-conference-report-part-1
Robert Kosara
2) Understanding and trust - we need to understand models
that makes important decisions.
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin (2016)
3) Models have assumptions, and we need early warnings
that something is wrong.
Do we need plots to understand models?
Yuan Tang, Masaaki Horikoshi, and Wenxuan Li.
"ggfortify: Unified Interface to Visualize Statistical
Result of Popular R Packages." The R Journal 8.2 (2016): 478-489.
Package ggfortify
lm
Yuan Tang, Masaaki Horikoshi, and Wenxuan Li.
"ggfortify: Unified Interface to Visualize Statistical
Result of Popular R Packages." The R Journal 8.2 (2016): 478-489.
Yuan Tang, Masaaki Horikoshi, and Wenxuan Li.
"ggfortify: Unified Interface to Visualize Statistical
Result of Popular R Packages." The R Journal 8.2 (2016): 478-489.
aareg
glmnet
lm
Package ggfortify
Fitting Validation Prediction
Life-cycle of a typical prognostic model
+ model fit
+ new data
+ training data
+ model skeleton
Reprodu-
cibility
Fitting Validation Prediction
Life-cycle of a typical prognostic model
+ model fit
+ new data
+ training data
+ model skeleton
What affects predictions?
- global interpretability
- local interpretability
Data validation
- are there outliers?
- is there a need for
variable
transformations?
Validation of assumptions
- residuals diagnostics
- are they ok?
Performance charts,
is it a good model?
- models comparisons
- what is the predictive performance?
Are convergence
criteria satisfied?
What are estimates
for model parameters?
Reprodu-
cibility
Prediction
What affects the prediction for a single observation?
(local interpretability)
Which elements of the model are the most important ones?
(global interpretability)
How a single variable affects the expected output?
(marginal interpretability)
Package randomForestExplainer
What is in a random forest?
Aleksandra Paluszynska, Przemyslaw Biecek (2017)
https://github.com/geneticsMiNIng/BlackBoxOpener
John Ehrlinger (2015)
ggRandomForests: Random Forests for Regression
Package randomForestExplainer
What is in a random forest?
Aleksandra Paluszynska, Przemyslaw Biecek (2017)
https://github.com/geneticsMiNIng/BlackBoxOpener
John Ehrlinger (2015)
ggRandomForests: Random Forests for Regression
pdp: An R Package for Constructing Partial Dependence Plots. Brandon M. Greenwell (2017)
https://journal.r-project.org/archive/2017/RJ-2017-016/index.html
Package pdp - Partial Dependence Plots
pdp: An R Package for Constructing Partial Dependence Plots. Brandon M. Greenwell (2017)
https://journal.r-project.org/archive/2017/RJ-2017-016/index.html
Package pdp - Partial Dependence Plots
LIME: Local Interpretable Model-agnostic Explanations
"Why Should I Trust You?" Explaining the Predictions of Any Classifier.
Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin (2016). https://arxiv.org/pdf/1602.04938.pdf
Port to R: Thomas Lin Pedersen (2017) https://github.com/thomasp85/lime
1. Generate a fake dataset around x.

2. Use black-box estimator to get target values y.

3. Train a new white-box estimator for (y,x).

4. Check prediction quality of a white-box classifier.

5. Use white-box estimator as an explanation of black-box model.
LIME: Local Interpretable Model-agnostic Explanations
"Why Should I Trust You?" Explaining the Predictions of Any Classifier.
Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin (2016). https://arxiv.org/pdf/1602.04938.pdf
Port to R: Thomas Lin Pedersen (2017) https://github.com/thomasp85/lime
1. Generate a fake dataset around x.

2. Use black-box estimator to get target values y.

3. Train a new white-box estimator for (y,x).

4. Check prediction quality of a white-box classifier.

5. Use white-box estimator as an explanation of black-box model.
Fitting
Are convergence
criteria satisfied?
What are estimates
of model parameters?
Are the assumptions satisfied? Do we need
variable transformation?
Package forestmodel
broom: An R Package for Converting Statistical Analysis Objects Into Tidy Data Frames,
David Robinson (2014) arXiv:1412.3565v2
Nick Kennedy (2017) https://github.com/NikNakk/forestmodel
Package forestmodel
regression
model
broom
forest_model
forestmodel
tidy
model
broom: An R Package for Converting Statistical Analysis Objects Into Tidy Data Frames,
David Robinson (2014) arXiv:1412.3565v2
Nick Kennedy (2017) https://github.com/NikNakk/forestmodel
Package factorMerger
Visualisation for post-hoc testing
Agnieszka Sitko, Przemysław Biecek (2017)
https://github.com/geneticsMiNIng/FactorMerger
Agnieszka Sitko, Przemysław Biecek (2017)
https://github.com/geneticsMiNIng/FactorMerger
Package factorMerger
Visualisation for post-hoc testing
Validation
Data validation
- are there outliers?
- is there a need for variable
transformations?
Validation of assumptions
- residuals diagnostics
- are they ok?
Performance charts,
is it a good model?
- model comparisons
- what is the performance?
Package plotROC
Michael Sachs (2016)
http://sachsmc.github.io/plotROC
Package plotROC
Michael Sachs (2016)
http://sachsmc.github.io/plotROC
Alboukadel Kassambara, Marcin Kosiński (2016)
https://github.com/kassambara/survminer
Diagnostic plots for various residuals:
"martingale", "deviance", "score", "schoenfeld",
"dfbeta", "dfbetas" and "scaledsch"
Package survminer
Alboukadel Kassambara, Marcin Kosiński (2016)
https://github.com/kassambara/survminer
Diagnostic plots for various residuals:
"martingale", "deviance", "score", "schoenfeld",
"dfbeta", "dfbetas" and "scaledsch"
Package survminer
Reproducibility
?? knitr
?? docker
?? packrat
archivist - reproducible and recordable research
library("archivist")
model <- lm(Sepal.Length ~ Sepal.Width, data=iris)
saveToLocalRepo(model)
models <- asearch("pbiecek/graphGallery", patterns = "class:lm")
modelsBIC <- sapply(models, BIC)
sort(modelsBIC)
## 990861c7c27812ee959f10e5f76fe2c3 2a6e492cb6982f230e48cf46023e2e4f
## 39.05577 67.52735
## 0a82efeb8250a47718cea9d7f64e5ae7 378237103bb60c58600fe69bed6c7f11
## 189.73593 189.73593
## 7f11e03539d48d35f7e7fe7780527ba7 c1b1ef7bcddefb181f79176015bc3931
## 189.73593 189.73593
Przemysław Biecek, Marcin Kosiński (2015) https://github.com/pbiecek/archivist
archivist - reproducible and recordable research
library("archivist")
model <- lm(Sepal.Length ~ Sepal.Width, data=iris)
saveToLocalRepo(model)
models <- asearch("pbiecek/graphGallery", patterns = "class:lm")
modelsBIC <- sapply(models, BIC)
sort(modelsBIC)
## 990861c7c27812ee959f10e5f76fe2c3 2a6e492cb6982f230e48cf46023e2e4f
## 39.05577 67.52735
## 0a82efeb8250a47718cea9d7f64e5ae7 378237103bb60c58600fe69bed6c7f11
## 189.73593 189.73593
## 7f11e03539d48d35f7e7fe7780527ba7 c1b1ef7bcddefb181f79176015bc3931
## 189.73593 189.73593
Przemysław Biecek, Marcin Kosiński (2015) https://github.com/pbiecek/archivist
Fitting Validation Prediction
Life-cycle of a typical prognostic model
Reprodu-
cibility
archivist
plotROC
survminer
randomForestExplainer
pdp
forestmodel
factorMerger
ggfortify
lime
What affects predictions?
- global interpretability
- local interpretability
Data validation
- are there outliers?
- is there a need for
variable
transformations?
Validation of assumptions
- residuals diagnostics
- are they ok?
Performance charts,
is it a good model?
- models comparisons
- what is the predictive performance?
Are convergence
criteria satisfied?
What are estimates
for model parameters?
ELI5 is a Python library which allows to visualize and
debug various Machine Learning models
http://eli5.readthedocs.io/en/latest/index.html
Visualizing statistical models: Removing the blindfold.
Hadley Wickham, Dianne Cook, Heike Hofmann (2015)
Statistical Analysis and Data Mining
http://had.co.nz/stat645/model-vis.pdf
Ideas on interpreting machine learning.
Patrick Hall, Wen Phan, SriSatish Ambati (2017)
https://www.oreilly.com/ideas/ideas-on-interpreting-machine-learning
rms: Regression Modeling Strategies.
Frank E Harrell (2009-2017) CRAN
https://cran.r-project.org/web/packages/rms/index.html
Thank you for your attention!
https://imgs.xkcd.com/comics/like_im_five.png

More Related Content

Similar to UseR 2017

Synergy of Human and Artificial Intelligence in Software Engineering
Synergy of Human and Artificial Intelligence in Software EngineeringSynergy of Human and Artificial Intelligence in Software Engineering
Synergy of Human and Artificial Intelligence in Software EngineeringTao Xie
 
Explainability for Learning to Rank
Explainability for Learning to RankExplainability for Learning to Rank
Explainability for Learning to RankSease
 
Keepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivosKeepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivosKeepler Data Tech
 
Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Cataldo Musto
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsDomino Data Lab
 
UMich CI Days: Scaling a code in the human dimension
UMich CI Days: Scaling a code in the human dimensionUMich CI Days: Scaling a code in the human dimension
UMich CI Days: Scaling a code in the human dimensionmatthewturk
 
Recommender Systems at Scale
Recommender Systems at ScaleRecommender Systems at Scale
Recommender Systems at ScaleEoin Hurrell, PhD
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and AnalyticsSrinath Perera
 
Wso2datasciencesummerschool20151 150714180825-lva1-app6892
Wso2datasciencesummerschool20151 150714180825-lva1-app6892Wso2datasciencesummerschool20151 150714180825-lva1-app6892
Wso2datasciencesummerschool20151 150714180825-lva1-app6892WSO2
 
Reproducibility challenges in computational settings: what are they, why shou...
Reproducibility challenges in computational settings: what are they, why shou...Reproducibility challenges in computational settings: what are they, why shou...
Reproducibility challenges in computational settings: what are they, why shou...Research Data Alliance
 
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Sri Ambati
 
R, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsR, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsKrishna Sankar
 
The Hitchhiker's Guide to Machine Learning with Python & Apache Spark
The Hitchhiker's Guide to Machine Learning with Python & Apache SparkThe Hitchhiker's Guide to Machine Learning with Python & Apache Spark
The Hitchhiker's Guide to Machine Learning with Python & Apache SparkKrishna Sankar
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookKeiichiro Ono
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsMarcel Kurovski
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systemsinovex GmbH
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine LearningSri Ambati
 
Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...
Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...
Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...Quantopian
 
Calin Constantinov - Neo4j - Bucharest Big Data Week Meetup - Bucharest 2018
Calin Constantinov - Neo4j - Bucharest Big Data Week Meetup - Bucharest 2018Calin Constantinov - Neo4j - Bucharest Big Data Week Meetup - Bucharest 2018
Calin Constantinov - Neo4j - Bucharest Big Data Week Meetup - Bucharest 2018Calin Constantinov
 

Similar to UseR 2017 (20)

Synergy of Human and Artificial Intelligence in Software Engineering
Synergy of Human and Artificial Intelligence in Software EngineeringSynergy of Human and Artificial Intelligence in Software Engineering
Synergy of Human and Artificial Intelligence in Software Engineering
 
Explainability for Learning to Rank
Explainability for Learning to RankExplainability for Learning to Rank
Explainability for Learning to Rank
 
Keepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivosKeepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivos
 
Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
 
UMich CI Days: Scaling a code in the human dimension
UMich CI Days: Scaling a code in the human dimensionUMich CI Days: Scaling a code in the human dimension
UMich CI Days: Scaling a code in the human dimension
 
Recommender Systems at Scale
Recommender Systems at ScaleRecommender Systems at Scale
Recommender Systems at Scale
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
Wso2datasciencesummerschool20151 150714180825-lva1-app6892
Wso2datasciencesummerschool20151 150714180825-lva1-app6892Wso2datasciencesummerschool20151 150714180825-lva1-app6892
Wso2datasciencesummerschool20151 150714180825-lva1-app6892
 
Reproducibility challenges in computational settings: what are they, why shou...
Reproducibility challenges in computational settings: what are they, why shou...Reproducibility challenges in computational settings: what are they, why shou...
Reproducibility challenges in computational settings: what are they, why shou...
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptx
 
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
 
R, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsR, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science Competitions
 
The Hitchhiker's Guide to Machine Learning with Python & Apache Spark
The Hitchhiker's Guide to Machine Learning with Python & Apache SparkThe Hitchhiker's Guide to Machine Learning with Python & Apache Spark
The Hitchhiker's Guide to Machine Learning with Python & Apache Spark
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter Notebook
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 
Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...
Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...
Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...
 
Calin Constantinov - Neo4j - Bucharest Big Data Week Meetup - Bucharest 2018
Calin Constantinov - Neo4j - Bucharest Big Data Week Meetup - Bucharest 2018Calin Constantinov - Neo4j - Bucharest Big Data Week Meetup - Bucharest 2018
Calin Constantinov - Neo4j - Bucharest Big Data Week Meetup - Bucharest 2018
 

More from Przemek Biecek

Responsible Machine Learning at Data Science Summit 2020
Responsible Machine Learning at Data Science Summit 2020Responsible Machine Learning at Data Science Summit 2020
Responsible Machine Learning at Data Science Summit 2020Przemek Biecek
 
Human-Centered Interpretable Machine Learning
Human-Centered Interpretable  Machine LearningHuman-Centered Interpretable  Machine Learning
Human-Centered Interpretable Machine LearningPrzemek Biecek
 
Human-Centered Interpretable Machine Learning
Human-Centered Interpretable  Machine LearningHuman-Centered Interpretable  Machine Learning
Human-Centered Interpretable Machine LearningPrzemek Biecek
 
Explore, Explain, and Debug aka Interpretable Machine Learning
Explore, Explain, and Debug aka Interpretable Machine LearningExplore, Explain, and Debug aka Interpretable Machine Learning
Explore, Explain, and Debug aka Interpretable Machine LearningPrzemek Biecek
 
XAI or DIE at Data Science Summit 2019
XAI or DIE at Data Science Summit 2019XAI or DIE at Data Science Summit 2019
XAI or DIE at Data Science Summit 2019Przemek Biecek
 
Machine learning meets Design. Design meets Machine learning.
Machine learning meets Design. Design meets Machine learning.Machine learning meets Design. Design meets Machine learning.
Machine learning meets Design. Design meets Machine learning.Przemek Biecek
 

More from Przemek Biecek (6)

Responsible Machine Learning at Data Science Summit 2020
Responsible Machine Learning at Data Science Summit 2020Responsible Machine Learning at Data Science Summit 2020
Responsible Machine Learning at Data Science Summit 2020
 
Human-Centered Interpretable Machine Learning
Human-Centered Interpretable  Machine LearningHuman-Centered Interpretable  Machine Learning
Human-Centered Interpretable Machine Learning
 
Human-Centered Interpretable Machine Learning
Human-Centered Interpretable  Machine LearningHuman-Centered Interpretable  Machine Learning
Human-Centered Interpretable Machine Learning
 
Explore, Explain, and Debug aka Interpretable Machine Learning
Explore, Explain, and Debug aka Interpretable Machine LearningExplore, Explain, and Debug aka Interpretable Machine Learning
Explore, Explain, and Debug aka Interpretable Machine Learning
 
XAI or DIE at Data Science Summit 2019
XAI or DIE at Data Science Summit 2019XAI or DIE at Data Science Summit 2019
XAI or DIE at Data Science Summit 2019
 
Machine learning meets Design. Design meets Machine learning.
Machine learning meets Design. Design meets Machine learning.Machine learning meets Design. Design meets Machine learning.
Machine learning meets Design. Design meets Machine learning.
 

Recently uploaded

Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service AvailableVastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Availablegargpaaro
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...HyderabadDolls
 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridihmeghakumariji156
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...ThinkInnovation
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...vershagrag
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 

Recently uploaded (20)

Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service AvailableVastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 

UseR 2017

  • 1. Show me your predictive model Why and how? Przemysław Biecek University of Warsaw Warsaw University of Technology
  • 2. Do we really need plots to understand numbers?
  • 5. Do we need plots to understand models? Is it all about accuracy?
  • 6. 1) …there is a lot of opportunity to do visualization for machine learning. Even many of the people working in the field don’t have good intuitions for how their systems work, and they need tools to inspect what they’re doing, debug, etc… https://eagereyes.org/blog/2017/eurovis-2017-conference-report-part-1 Robert Kosara 2) Understanding and trust - we need to understand models that makes important decisions. "Why Should I Trust You?": Explaining the Predictions of Any Classifier Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin (2016) 3) Models have assumptions, and we need early warnings that something is wrong. Do we need plots to understand models?
  • 7. 1) …there is a lot of opportunity to do visualization for machine learning. Even many of the people working in the field don’t have good intuitions for how their systems work, and they need tools to inspect what they’re doing, debug, etc… https://eagereyes.org/blog/2017/eurovis-2017-conference-report-part-1 Robert Kosara 2) Understanding and trust - we need to understand models that makes important decisions. "Why Should I Trust You?": Explaining the Predictions of Any Classifier Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin (2016) 3) Models have assumptions, and we need early warnings that something is wrong. Do we need plots to understand models?
  • 8. Yuan Tang, Masaaki Horikoshi, and Wenxuan Li. "ggfortify: Unified Interface to Visualize Statistical Result of Popular R Packages." The R Journal 8.2 (2016): 478-489. Package ggfortify lm
  • 9. Yuan Tang, Masaaki Horikoshi, and Wenxuan Li. "ggfortify: Unified Interface to Visualize Statistical Result of Popular R Packages." The R Journal 8.2 (2016): 478-489. Yuan Tang, Masaaki Horikoshi, and Wenxuan Li. "ggfortify: Unified Interface to Visualize Statistical Result of Popular R Packages." The R Journal 8.2 (2016): 478-489. aareg glmnet lm Package ggfortify
  • 10. Fitting Validation Prediction Life-cycle of a typical prognostic model + model fit + new data + training data + model skeleton Reprodu- cibility
  • 11. Fitting Validation Prediction Life-cycle of a typical prognostic model + model fit + new data + training data + model skeleton What affects predictions? - global interpretability - local interpretability Data validation - are there outliers? - is there a need for variable transformations? Validation of assumptions - residuals diagnostics - are they ok? Performance charts, is it a good model? - models comparisons - what is the predictive performance? Are convergence criteria satisfied? What are estimates for model parameters? Reprodu- cibility
  • 12. Prediction What affects the prediction for a single observation? (local interpretability) Which elements of the model are the most important ones? (global interpretability) How a single variable affects the expected output? (marginal interpretability)
  • 13. Package randomForestExplainer What is in a random forest? Aleksandra Paluszynska, Przemyslaw Biecek (2017) https://github.com/geneticsMiNIng/BlackBoxOpener John Ehrlinger (2015) ggRandomForests: Random Forests for Regression
  • 14. Package randomForestExplainer What is in a random forest? Aleksandra Paluszynska, Przemyslaw Biecek (2017) https://github.com/geneticsMiNIng/BlackBoxOpener John Ehrlinger (2015) ggRandomForests: Random Forests for Regression
  • 15. pdp: An R Package for Constructing Partial Dependence Plots. Brandon M. Greenwell (2017) https://journal.r-project.org/archive/2017/RJ-2017-016/index.html Package pdp - Partial Dependence Plots
  • 16. pdp: An R Package for Constructing Partial Dependence Plots. Brandon M. Greenwell (2017) https://journal.r-project.org/archive/2017/RJ-2017-016/index.html Package pdp - Partial Dependence Plots
  • 17. LIME: Local Interpretable Model-agnostic Explanations "Why Should I Trust You?" Explaining the Predictions of Any Classifier. Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin (2016). https://arxiv.org/pdf/1602.04938.pdf Port to R: Thomas Lin Pedersen (2017) https://github.com/thomasp85/lime 1. Generate a fake dataset around x. 2. Use black-box estimator to get target values y. 3. Train a new white-box estimator for (y,x). 4. Check prediction quality of a white-box classifier. 5. Use white-box estimator as an explanation of black-box model.
  • 18. LIME: Local Interpretable Model-agnostic Explanations "Why Should I Trust You?" Explaining the Predictions of Any Classifier. Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin (2016). https://arxiv.org/pdf/1602.04938.pdf Port to R: Thomas Lin Pedersen (2017) https://github.com/thomasp85/lime 1. Generate a fake dataset around x. 2. Use black-box estimator to get target values y. 3. Train a new white-box estimator for (y,x). 4. Check prediction quality of a white-box classifier. 5. Use white-box estimator as an explanation of black-box model.
  • 19. Fitting Are convergence criteria satisfied? What are estimates of model parameters? Are the assumptions satisfied? Do we need variable transformation?
  • 20. Package forestmodel broom: An R Package for Converting Statistical Analysis Objects Into Tidy Data Frames, David Robinson (2014) arXiv:1412.3565v2 Nick Kennedy (2017) https://github.com/NikNakk/forestmodel
  • 21. Package forestmodel regression model broom forest_model forestmodel tidy model broom: An R Package for Converting Statistical Analysis Objects Into Tidy Data Frames, David Robinson (2014) arXiv:1412.3565v2 Nick Kennedy (2017) https://github.com/NikNakk/forestmodel
  • 22. Package factorMerger Visualisation for post-hoc testing Agnieszka Sitko, Przemysław Biecek (2017) https://github.com/geneticsMiNIng/FactorMerger
  • 23. Agnieszka Sitko, Przemysław Biecek (2017) https://github.com/geneticsMiNIng/FactorMerger Package factorMerger Visualisation for post-hoc testing
  • 24. Validation Data validation - are there outliers? - is there a need for variable transformations? Validation of assumptions - residuals diagnostics - are they ok? Performance charts, is it a good model? - model comparisons - what is the performance?
  • 25. Package plotROC Michael Sachs (2016) http://sachsmc.github.io/plotROC
  • 26. Package plotROC Michael Sachs (2016) http://sachsmc.github.io/plotROC
  • 27. Alboukadel Kassambara, Marcin Kosiński (2016) https://github.com/kassambara/survminer Diagnostic plots for various residuals: "martingale", "deviance", "score", "schoenfeld", "dfbeta", "dfbetas" and "scaledsch" Package survminer
  • 28. Alboukadel Kassambara, Marcin Kosiński (2016) https://github.com/kassambara/survminer Diagnostic plots for various residuals: "martingale", "deviance", "score", "schoenfeld", "dfbeta", "dfbetas" and "scaledsch" Package survminer
  • 30. archivist - reproducible and recordable research library("archivist") model <- lm(Sepal.Length ~ Sepal.Width, data=iris) saveToLocalRepo(model) models <- asearch("pbiecek/graphGallery", patterns = "class:lm") modelsBIC <- sapply(models, BIC) sort(modelsBIC) ## 990861c7c27812ee959f10e5f76fe2c3 2a6e492cb6982f230e48cf46023e2e4f ## 39.05577 67.52735 ## 0a82efeb8250a47718cea9d7f64e5ae7 378237103bb60c58600fe69bed6c7f11 ## 189.73593 189.73593 ## 7f11e03539d48d35f7e7fe7780527ba7 c1b1ef7bcddefb181f79176015bc3931 ## 189.73593 189.73593 Przemysław Biecek, Marcin Kosiński (2015) https://github.com/pbiecek/archivist
  • 31. archivist - reproducible and recordable research library("archivist") model <- lm(Sepal.Length ~ Sepal.Width, data=iris) saveToLocalRepo(model) models <- asearch("pbiecek/graphGallery", patterns = "class:lm") modelsBIC <- sapply(models, BIC) sort(modelsBIC) ## 990861c7c27812ee959f10e5f76fe2c3 2a6e492cb6982f230e48cf46023e2e4f ## 39.05577 67.52735 ## 0a82efeb8250a47718cea9d7f64e5ae7 378237103bb60c58600fe69bed6c7f11 ## 189.73593 189.73593 ## 7f11e03539d48d35f7e7fe7780527ba7 c1b1ef7bcddefb181f79176015bc3931 ## 189.73593 189.73593 Przemysław Biecek, Marcin Kosiński (2015) https://github.com/pbiecek/archivist
  • 32. Fitting Validation Prediction Life-cycle of a typical prognostic model Reprodu- cibility archivist plotROC survminer randomForestExplainer pdp forestmodel factorMerger ggfortify lime What affects predictions? - global interpretability - local interpretability Data validation - are there outliers? - is there a need for variable transformations? Validation of assumptions - residuals diagnostics - are they ok? Performance charts, is it a good model? - models comparisons - what is the predictive performance? Are convergence criteria satisfied? What are estimates for model parameters?
  • 33. ELI5 is a Python library which allows to visualize and debug various Machine Learning models http://eli5.readthedocs.io/en/latest/index.html Visualizing statistical models: Removing the blindfold. Hadley Wickham, Dianne Cook, Heike Hofmann (2015) Statistical Analysis and Data Mining http://had.co.nz/stat645/model-vis.pdf Ideas on interpreting machine learning. Patrick Hall, Wen Phan, SriSatish Ambati (2017) https://www.oreilly.com/ideas/ideas-on-interpreting-machine-learning rms: Regression Modeling Strategies. Frank E Harrell (2009-2017) CRAN https://cran.r-project.org/web/packages/rms/index.html Thank you for your attention! https://imgs.xkcd.com/comics/like_im_five.png