Machine learning is ultimately a search for the best combination of features, algorithm, and hyperparameters that result in the best performing model. Oftentimes, this leads us to stay in our algorithmic comfort zones, or to resort to automated processes such as grid searches and random walks. Whether we stick to what we know or try many combinations, we are sometimes left wondering if we have actually succeeded.
By enhancing model selection with visual diagnostics, data scientists can inject human guidance to steer the search process. Visualizing feature transformations, algorithmic behavior, cross-validation methods, and model performance allows us a peek into the high dimensional realm that our models operate. As we continue to tune our models, trying to minimize both bias and variance, these glimpses allow us to be more strategic in our choices. The result is more effective modeling, speedier results, and greater understanding of underlying processes.
Visualization is an integral part of the data science workflow, but visual diagnostics are directly tied to machine learning transformers and models. The Yellowbrick library extends the scikit-learn API providing a Visualizer object, an estimator that learns from data and produces a visualization as a result. In this tutorial, we will explore feature visualizers, visualizers for classification, clustering, and regression, as well as model analysis visualizers. We'll work through several examples and show how visual diagnostics steer model selection, making machine learning more informed, and more effective.
2. Dr. Rebecca Bilbro
Head of Data Science, ICX Media
Co-creator, Scikit-Yellowbrick
Author, Applied Text Analysis with Python
@rebeccabilbro
3. The machine learning problem:
Given a set of n data samples,
each represented by >1 number,
create a model that is able to predict
properties of as-yet unseen samples.
4. Ask me for my strong
opinions about Random
Forests.
5. The Model Selection Triple
Arun Kumar http://bit.ly/2abVNrI
Feature
Analysis
Algorithm
Selection
Hyperparameter
Tuning
9. Features pull instances towards
their position on the circle in
proportion to their normalized
numerical value for that
instance.
Radial Visualization
12. Feature Importance Plot
Visualize the relative
importance of each feature
to the model.
Identify weak features or
combinations of features
that are candidates for
removal.
13. Rank2D
Visualize pairwise relationships
as a heatmap.
Pearson shows us strong
correlations, potential
collinearity. Covariance helps us
understand the sequence of
relationships.
20. Evaluating Classifiers
Do we want to minimize false positives?
precision = true positives / (true positives + false positives)
Do we want to minimize false negatives?
recall = true positives / (false negatives + true positives)
Will we need to compare many models?
F1 score = 2 * ((precision * recall) / (precision + recall))
Are the classes imbalanced?
support = number of training samples per class
22. Class Prediction Error
Do I care about being right
(or about not being wrong)
for some categories more
than for others?
23. Evaluating Regressors
R2
: How well does the model describe the training data? How well does the
model predict out-of-sample data?
MSE/ASE: How sensitive is the model to outliers?
24. Prediction Error Plots
Visualize prediction errors as a
scatterplot of the predicted &
actual values.
Visualize the line of best fit &
compare t0 the 45Âş line.
25. Plotting Residuals
Are residuals random?
We should not be able to
predict error!
Visualize train and test
data with different
colors.
33. Hyperparameters
â—Ź When we call fit() on an estimator, it learns the parameters of the algorithm
that make it fit the data best.
â—Ź However, some parameters are not directly learned within an estimator.
E.g.
â—‹ depth of a decision tree
â—‹ alpha for regularization
â—‹ kernel for support vector machines
â—‹ number of clusters for centroidal clustering
â—Ź These parameters are often referred to as hyperparameters.
37. higher silhouette scores
mean denser, more
separate clusters
The elbow
shows the
best value
of k…
Or suggests
a different
algorithm
K-selection with Yellowbrick
38. Should I use Lasso,
Ridge, or ElasticNet? Is
regularlization even
working?
More alpha => less
complexity
Reduced bias, but
increased variance
Alpha selection with Yellowbrick
42. Yellowbrick is an open source project supported by a
community who will gratefully and humbly accept any
contributions you might make to the project.
We want to be the best first place to contribute.