The document discusses advanced machine learning workflows that can be implemented using WhizzML, an automated machine learning programming language. It provides examples of implementing best-first feature selection, stacked generalization, and gradient boosting algorithms as workflows composed of machine learning operations. The document outlines how algorithms like these that are composed of iterative modeling, prediction, and evaluation steps can be automated and scaled using the composable primitives and backend infrastructure of WhizzML. It highlights how non-trivial model selection, automation of tasks, and advanced algorithms are possible with WhizzML workflows.
2. Outline
1 Introduction
2 Advanced WorkďŹows
3 A WhizzML Implementation of Best-ďŹrst Feature Selection
4 Even More WorkďŹows!
5 Stacked Generalization in WhizzML
6 A Brief Look at Gradient Boosting in WhizzML
7 Wrapping Up
#VSSML16 Automating Machine Learning September 2016 2 / 34
3. Outline
1 Introduction
2 Advanced WorkďŹows
3 A WhizzML Implementation of Best-ďŹrst Feature Selection
4 Even More WorkďŹows!
5 Stacked Generalization in WhizzML
6 A Brief Look at Gradient Boosting in WhizzML
7 Wrapping Up
#VSSML16 Automating Machine Learning September 2016 3 / 34
4. What Do We Know About WhizzML?
⢠Itâs a complete programming language
⢠Machine learning âoperationsâ are ďŹrst-class
⢠Those operations are performed in BigMLâs backend
One-line of code to perform API requests
We get scale âfor freeâ
⢠Everything is Composable
Functions
Libraries
The Web Interface
#VSSML16 Automating Machine Learning September 2016 4 / 34
5. What Can We Do With It?
⢠Non-trivial Model Selection
n-fold cross validation
Comparison of model types (tree, ensemble, logistic)
⢠Automation of Drudgery
One-click retraining/validation
Standarized dataset transformations / cleaning
⢠Sure, but what else?
#VSSML16 Automating Machine Learning September 2016 5 / 34
6. Outline
1 Introduction
2 Advanced WorkďŹows
3 A WhizzML Implementation of Best-ďŹrst Feature Selection
4 Even More WorkďŹows!
5 Stacked Generalization in WhizzML
6 A Brief Look at Gradient Boosting in WhizzML
7 Wrapping Up
#VSSML16 Automating Machine Learning September 2016 6 / 34
7. Algorithms as WorkďŹows
⢠Many ML algorithms can be thought of as workďŹows
⢠In these algorithms, machine learning operations are the
primitives
Make a model
Make a prediction
Evaluate a model
⢠Many such algorithms can be implemented in WhizzML
Reap the advantages of BigMLâs infrastructure
Once implemented, it is language-agnostic
#VSSML16 Automating Machine Learning September 2016 7 / 34
8. Examples: Best-ďŹrst Feature Selection
Objective: Select the n best features for modeling your data
⢠Initialize a set S of used features as the empty set
⢠Split your dataset into training and test sets
⢠For i in 1 . . . n
For each feature f not in S, model and evaluate with feature set
S + f
Greedily select Ëf, the feature with the best performance and set
S â S + Ëf
https://github.com/whizzml/examples/tree/master/best-first
#VSSML16 Automating Machine Learning September 2016 8 / 34
9. Outline
1 Introduction
2 Advanced WorkďŹows
3 A WhizzML Implementation of Best-ďŹrst Feature Selection
4 Even More WorkďŹows!
5 Stacked Generalization in WhizzML
6 A Brief Look at Gradient Boosting in WhizzML
7 Wrapping Up
#VSSML16 Automating Machine Learning September 2016 9 / 34
10. Modeling
First, construct a bunch of models. selected is the features
that have already been selected, and potentials are the
candidates we might select on this iteration.
(define (make-models dataset-id obj-field selected potentials)
(let (model-req {"dataset" dataset-id "objective_field" obj-field}
make-req (lambda (fid)
(assoc model-req "input_fields" (cons fid selected)))
all-reqs (map make-req potentials))
(create-and-wait* "model" all-reqs)))
#VSSML16 Automating Machine Learning September 2016 10 / 34
11. Evaluation
Now, conduct the evaluations. potentials is again the list
of potential features to add, and model-ids is the list of
corresponding model-ids created in the last step.
(define (select-feature test-dataset-id potentials model-ids)
(let (eval-req {"dataset" test-dataset-id}
make-req (lambda (mid) (assoc eval-req "model" mid))
all-reqs (map make-req model-ids)
evs (map fetch (create-and-wait* "evaluation" all-reqs))
vs (map (lambda (ev) (get-in ev ["result" "model" "average_phi"])) evs)
value-map (make-map potentials vs) ;; e.g, {"000000" 0.8 "0000001" 0.7}
max-val (get-max vs)
choose-best (lambda (id) (if (= max-val (get value-map id)) id false)))
(some choose-best potentials)))
#VSSML16 Automating Machine Learning September 2016 11 / 34
12. Main Loop
The main loop of the algorithm. Set up your objective id,
inputs, and training and test dataset. Initialize the selected
features to the empty set and iteratively call the previous two
functions.
(define (select-features dataset-id nfeatures)
(let (obj-id (dataset-get-objective-id dataset-id)
input-ids (default-inputs dataset-id obj-id)
splits (split-dataset dataset-id 0.5)
train-id (nth splits 0)
test-id (nth splits 1))
(loop (selected []
potentials input-ids)
(if (or (>= (count selected) nfeatures) (empty? potentials))
(feature-names dataset-id selected)
(let (model-ids (make-models dataset-id obj-id selected potentials)
next-feat (select-feature test-id potentials model-ids))
(recur (cons next-feat selected)
(filter (lambda (id) (not (= id next-feat))) potentials)))))))
#VSSML16 Automating Machine Learning September 2016 12 / 34
13. Outline
1 Introduction
2 Advanced WorkďŹows
3 A WhizzML Implementation of Best-ďŹrst Feature Selection
4 Even More WorkďŹows!
5 Stacked Generalization in WhizzML
6 A Brief Look at Gradient Boosting in WhizzML
7 Wrapping Up
#VSSML16 Automating Machine Learning September 2016 13 / 34
14. Examples: Stacked Generalization
Objective: Improve predictions by modeling the output scores of
multiple trained models.
⢠Create a training and a holdout set
⢠Create n different models on the training set (with some difference
among them; e.g., single-tree vs. ensemble vs. logistic regression)
⢠Make predictions from those models on the holdout set
⢠Train a model to predict the class based on the other modelsâ
predictions
#VSSML16 Automating Machine Learning September 2016 14 / 34
15. Examples: Randomized Parameter Optimization
Objective: Find the best set of parameters for a machine learning
algorithm
⢠Do:
Generate a random set of parameters for an ML algorithm
Do 10-fold cross-validation with those parameters
⢠Until you get a set of parameters that performs âwellâ or you get
bored
#VSSML16 Automating Machine Learning September 2016 15 / 34
16. Examples: SMACdown
Objective: Find the best set of parameters even more quickly!
⢠Do:
Generate several random sets of parameters for an ML algorithm
Do 10-fold cross-validation with those parameters
Learn a predictive model to predict performance from parameter
values
Use the model to help you select the next set of parameters to
evaluate
⢠Until you get a set of parameters that performs âwellâ or you get
bored
Coming soon to a WhizzML gallery near you!
#VSSML16 Automating Machine Learning September 2016 16 / 34
17. Examples: Boosting
⢠General idea: Iteratively model the dataset
Each iteration is trained on the mistakes of previous iterations
Said another way, the objective changes each iteration
The ďŹnal model is a summation of all iterations
⢠Lots of variations on this theme
Adaboost
Logitboost
Martingale Boosting
Gradient Boosting
⢠Letâs take a look at a WhizzML implementation of the latter
#VSSML16 Automating Machine Learning September 2016 17 / 34
18. Outline
1 Introduction
2 Advanced WorkďŹows
3 A WhizzML Implementation of Best-ďŹrst Feature Selection
4 Even More WorkďŹows!
5 Stacked Generalization in WhizzML
6 A Brief Look at Gradient Boosting in WhizzML
7 Wrapping Up
#VSSML16 Automating Machine Learning September 2016 18 / 34
19. A Stacked generalization library: creating the stack
;; Splits the given dataset, using half of it to create
;; an heterogeneous collection of models and the other
;; half to train a tree that predicts based on those other
;; models predictions. Returns a map with the collection
;; of models (under the key "models") and the meta-prediction
;; as the value of the key "metamodel". The key "result"
;; has as value a boolean flag indicating whether the
;; process was successful.
(define (make-stack dataset-id)
(let (ids (split-dataset-and-wait dataset-id 0.5)
train-id (nth ids 0)
hold-id (nth ids 1)
models (create-stack-models train-id)
id (create-stack-predictions models hold-id)
orig-fields (model-inputs (head models))
obj-id (dataset-get-objective-id train-id)
meta-id (create-and-wait-model {"dataset" id
"excluded_fields" orig-fields
"objective_field" obj-id})
success? (resource-done? (fetch meta-id)))
{"models" models "metamodel" meta-id "result" success?}))
#VSSML16 Automating Machine Learning September 2016 19 / 34
20. A Stacked generalization library: using the stack
;; Use the models and metamodels computed by make-stack
;; to make a prediction on the input-data map. Returns
;; the identifier of the prediction object.
(define (make-stack-prediction models meta-model input-data)
(let (preds (map (lambda (m) (create-prediction {"model" m
"input_data" input-data}))
models)
preds (map (lambda (p)
(head (values (get (fetch p) "prediction"))))
preds)
meta-input (make-map (model-inputs meta-model) preds))
(create-prediction {"model" meta-model "input_data" meta-input})))
#VSSML16 Automating Machine Learning September 2016 20 / 34
21. A Stacked generalization library: auxiliary functions
;; Extract for a batchpredction its associated dataset of results
(define (batch-dataset id)
(wait-forever (get (fetch id) "output_dataset_resource")))
;; Create a batchprediction for the given model and datasets,
;; with a map of additional options and using defaults appropriate
;; for model stacking
(define (make-batch ds-id mod-id opts)
(create-batchprediction (merge {"all_fields" true
"output_dataset" true
"dataset" ds-id
"model" (wait-forever mod-id)}
{})))
;; Auxiliary function extracting the model_inputs of a model
(define (model-inputs mod-id)
(get (fetch mod-id) "input_fields"))
#VSSML16 Automating Machine Learning September 2016 21 / 34
22. A Stacked generalization library: creating the stack
;; Splits the given dataset, using half of it to create
;; an heterogeneous collection of models and the other
;; half to train a tree that predicts based on those other
;; models predictions. Returns a map with the collection
;; of models (under the key "models") and the meta-prediction
;; as the value of the key "metamodel". The key "result"
;; has as value a boolean flag indicating whether the
;; process was successful.
(define (make-stack dataset-id)
(let (ids (split-dataset-and-wait dataset-id 0.5)
train-id (nth ids 0)
hold-id (nth ids 1)
models (create-stack-models train-id)
id (create-stack-predictions models hold-id)
orig-fields (model-inputs (head models))
obj-id (dataset-get-objective-id train-id)
meta-id (create-and-wait-model {"dataset" id
"excluded_fields" orig-fields
"objective_field" obj-id})
success? (resource-done? (fetch meta-id)))
{"models" models "metamodel" meta-id "result" success?}))
#VSSML16 Automating Machine Learning September 2016 22 / 34
24. Outline
1 Introduction
2 Advanced WorkďŹows
3 A WhizzML Implementation of Best-ďŹrst Feature Selection
4 Even More WorkďŹows!
5 Stacked Generalization in WhizzML
6 A Brief Look at Gradient Boosting in WhizzML
7 Wrapping Up
#VSSML16 Automating Machine Learning September 2016 24 / 34
25. The Main Loop
⢠Given the currently predicted class probablilities, compute a
gradient step that will push those probabilities in the right direction
⢠Learn regression trees to represent this step over the training set
⢠Make a prediction with each tree
⢠Sum this prediction with all gradient steps so far to get a set of
scores for each point in the training data (one score for each class)
⢠Apply the softmax function to these sums to get a set of class
probabilities for each point.
⢠Iterate!
Clone it here:
https://github.com/whizzml/examples/tree/master/gradient-boosting
#VSSML16 Automating Machine Learning September 2016 25 / 34
26. What will this look like in WhizzML?
⢠Several things here are machine learning operations
Constructing gradient models
Making predictions
⢠But several are not
Summing the gradient steps
Computing softmax probabilities
Computing gradients
⢠We donât want to do those things locally (data size, resource
concerns)
⢠Can we do these things on BigMLâs infrastructure?
#VSSML16 Automating Machine Learning September 2016 26 / 34
27. Compute Gradients From Probabilities
⢠Letâs just focus on computing the gradients for a moment
⢠Get the predictions from the previous iteration
The sum of all of the previous gradient steps is stored in a column
If this is the ďŹrst iteration, assume the uniform distribution
⢠Gradient for class k is just y â p(k) where y is 1 if the pointâs class
is k and 0 otherwise.
#VSSML16 Automating Machine Learning September 2016 27 / 34
30. Aside: WhizzML + Flatline
⢠How can we do computations on the data?
Use Flatline: A language for data manipulation
Executed in BigML as a Dataset Transformation
https://github.com/bigmlcom/flatline/blob/master/
user-manual.md
⢠BeneďŹts
Abitrary operations on the data are now API calls
Computational details are taken care of
Upload your data once, do anything to it
⢠Flatline is a First-class Citizen of WhizzML
#VSSML16 Automating Machine Learning September 2016 30 / 34
31. Creating a new feature in Flatline
⢠We need to subtract one column value from another
⢠Flatline provides the f operator to get a named ďŹeld value from
any row
(- (f "actual") (f "predicted"))
⢠But remember, if we have n classes, we also have n gradients to
construct!
⢠Enter WhizzML!
#VSSML16 Automating Machine Learning September 2016 31 / 34
33. Outline
1 Introduction
2 Advanced WorkďŹows
3 A WhizzML Implementation of Best-ďŹrst Feature Selection
4 Even More WorkďŹows!
5 Stacked Generalization in WhizzML
6 A Brief Look at Gradient Boosting in WhizzML
7 Wrapping Up
#VSSML16 Automating Machine Learning September 2016 33 / 34
34. What Have We Learned?
⢠You can implement workďŹows of arbitrary complexity with
WhizzML
⢠The power of WhizzML with Flatline
⢠Editorial: The CommodiďŹcation of Machine Learning Algorithms
Every language has itâs own ML algorithms now
With WhizzML, implement once and use anywhere
Never worry about architecture again
#VSSML16 Automating Machine Learning September 2016 34 / 34