SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Advanced WhizzML Workflows
The BigML Team
May 2016
The BigML Team Advanced WhizzML Workflows May 2016 1 / 34
Outline
1 Introduction
2 Advanced Workflows
3 A WhizzML Implementation of Best-first Feature Selection
4 Even More Workflows!
5 Stacked Generalization in WhizzML
6 A Brief Look at Gradient Boosting in WhizzML
7 Wrapping Up
The BigML Team Advanced WhizzML Workflows May 2016 2 / 34
Outline
1 Introduction
2 Advanced Workflows
3 A WhizzML Implementation of Best-first Feature Selection
4 Even More Workflows!
5 Stacked Generalization in WhizzML
6 A Brief Look at Gradient Boosting in WhizzML
7 Wrapping Up
The BigML Team Advanced WhizzML Workflows May 2016 3 / 34
What Do We Know About WhizzML?
• It’s a complete programming language
• Machine learning “operations” are first-class
• Those operations are performed in BigML’s backend
One-line of code to perform API requests
We get scale “for free”
• Everything is Composable
Functions
Libraries
The Web Interface
The BigML Team Advanced WhizzML Workflows May 2016 4 / 34
What Can We Do With It?
• Non-trivial Model Selection
n-fold cross validation
Comparison of model types (tree, ensemble, logistic)
• Automation of Drudgery
One-click retraining/validation
Standarized dataset transformations / cleaning
• Sure, but what else?
The BigML Team Advanced WhizzML Workflows May 2016 5 / 34
Outline
1 Introduction
2 Advanced Workflows
3 A WhizzML Implementation of Best-first Feature Selection
4 Even More Workflows!
5 Stacked Generalization in WhizzML
6 A Brief Look at Gradient Boosting in WhizzML
7 Wrapping Up
The BigML Team Advanced WhizzML Workflows May 2016 6 / 34
Algorithms as Workflows
• Many ML algorithms can be thought of as workflows
• In these algorithms, machine learning operations are the
primitives
Make a model
Make a prediction
Evaluate a model
• Many such algorithms can be implemented in WhizzML
Reap the advantages of BigML’s infrastructure
Once implemented, it is language-agnostic
The BigML Team Advanced WhizzML Workflows May 2016 7 / 34
Examples: Best-first Feature Selection
Objective: Select the n best features for modeling your data
• Initialize a set S of used features as the empty set
• Split your dataset into training and test sets
• For i in 1 . . . n
For each feature f not in S, model and evaluate with feature set
S + f
Greedily select ˆf, the feature with the best performance and set
S ← S + ˆf
https://github.com/whizzml/examples/tree/master/best-first
The BigML Team Advanced WhizzML Workflows May 2016 8 / 34
Outline
1 Introduction
2 Advanced Workflows
3 A WhizzML Implementation of Best-first Feature Selection
4 Even More Workflows!
5 Stacked Generalization in WhizzML
6 A Brief Look at Gradient Boosting in WhizzML
7 Wrapping Up
The BigML Team Advanced WhizzML Workflows May 2016 9 / 34
Modeling
First, construct a bunch of models. selected is the features
that have already been selected, and potentials are the
candidates we might select on this iteration.
(define (make-models dataset-id obj-field selected potentials)
(let (model-req {"dataset" dataset-id "objective_field" obj-field}
make-req (lambda (fid)
(assoc model-req "input_fields" (cons fid selected)))
all-reqs (map make-req potentials))
(create-and-wait* "model" all-reqs)))
The BigML Team Advanced WhizzML Workflows May 2016 10 / 34
Evaluation
Now, conduct the evaluations. potentials is again the list
of potential features to add, and model-ids is the list of
corresponding model-ids created in the last step.
(define (select-feature test-dataset-id potentials model-ids)
(let (eval-req {"dataset" test-dataset-id}
make-req (lambda (mid) (assoc eval-req "model" mid))
all-reqs (map make-req model-ids)
evs (map fetch (create-and-wait* "evaluation" all-reqs))
vs (map (lambda (ev) (get-in ev ["result" "model" "average_phi"])) evs)
value-map (make-map potentials vs) ;; e.g, {"000000" 0.8 "0000001" 0.7}
max-val (get-max vs)
choose-best (lambda (id) (if (= max-val (get value-map id)) id false)))
(some choose-best potentials)))
The BigML Team Advanced WhizzML Workflows May 2016 11 / 34
Main Loop
The main loop of the algorithm. Set up your objective id,
inputs, and training and test dataset. Initialize the selected
features to the empty set and iteratively call the previous two
functions.
(define (select-features dataset-id nfeatures)
(let (obj-id (dataset-get-objective-id dataset-id)
input-ids (default-inputs dataset-id obj-id)
splits (split-dataset dataset-id 0.5)
train-id (nth splits 0)
test-id (nth splits 1))
(loop (selected []
potentials input-ids)
(if (or (>= (count selected) nfeatures) (empty? potentials))
(feature-names dataset-id selected)
(let (model-ids (make-models dataset-id obj-id selected potentials)
next-feat (select-feature test-id potentials model-ids))
(recur (cons next-feat selected)
(filter (lambda (id) (not (= id next-feat))) potentials)))))))
The BigML Team Advanced WhizzML Workflows May 2016 12 / 34
Outline
1 Introduction
2 Advanced Workflows
3 A WhizzML Implementation of Best-first Feature Selection
4 Even More Workflows!
5 Stacked Generalization in WhizzML
6 A Brief Look at Gradient Boosting in WhizzML
7 Wrapping Up
The BigML Team Advanced WhizzML Workflows May 2016 13 / 34
Examples: Stacked Generalization
Objective: Improve predictions by modeling the output scores of
multiple trained models.
• Create a training and a holdout set
• Create n different models on the training set (with some difference
among them; e.g., single-tree vs. ensemble vs. logistic regression)
• Make predictions from those models on the holdout set
• Train a model to predict the class based on the other models’
predictions
The BigML Team Advanced WhizzML Workflows May 2016 14 / 34
Examples: Randomized Parameter Optimization
Objective: Find the best set of parameters for a machine learning
algorithm
• Do:
Generate a random set of parameters for an ML algorithm
Do 10-fold cross-validation with those parameters
• Until you get a set of parameters that performs “well” or you get
bored
The BigML Team Advanced WhizzML Workflows May 2016 15 / 34
Examples: SMACdown
Objective: Find the best set of parameters even more quickly!
• Do:
Generate several random sets of parameters for an ML algorithm
Do 10-fold cross-validation with those parameters
Learn a predictive model to predict performance from parameter
values
Use the model to help you select the next set of parameters to
evaluate
• Until you get a set of parameters that performs “well” or you get
bored
Coming soon to a WhizzML gallery near you!
The BigML Team Advanced WhizzML Workflows May 2016 16 / 34
Examples: Boosting
• General idea: Iteratively model the dataset
Each iteration is trained on the mistakes of previous iterations
Said another way, the objective changes each iteration
The final model is a summation of all iterations
• Lots of variations on this theme
Adaboost
Logitboost
Martingale Boosting
Gradient Boosting
• Let’s take a look at a WhizzML implementation of the latter
The BigML Team Advanced WhizzML Workflows May 2016 17 / 34
Outline
1 Introduction
2 Advanced Workflows
3 A WhizzML Implementation of Best-first Feature Selection
4 Even More Workflows!
5 Stacked Generalization in WhizzML
6 A Brief Look at Gradient Boosting in WhizzML
7 Wrapping Up
The BigML Team Advanced WhizzML Workflows May 2016 18 / 34
A Stacked generalization library: creating the stack
;; Splits the given dataset, using half of it to create
;; an heterogeneous collection of models and the other
;; half to train a tree that predicts based on those other
;; models predictions. Returns a map with the collection
;; of models (under the key "models") and the meta-prediction
;; as the value of the key "metamodel". The key "result"
;; has as value a boolean flag indicating whether the
;; process was successful.
(define (make-stack dataset-id)
(let (ids (split-dataset-and-wait dataset-id 0.5)
train-id (nth ids 0)
hold-id (nth ids 1)
models (create-stack-models train-id)
id (create-stack-predictions models hold-id)
orig-fields (model-inputs (head models))
obj-id (dataset-get-objective-id train-id)
meta-id (create-and-wait-model {"dataset" id
"excluded_fields" orig-fields
"objective_field" obj-id})
success? (resource-done? (fetch meta-id)))
{"models" models "metamodel" meta-id "result" success?}))
The BigML Team Advanced WhizzML Workflows May 2016 19 / 34
A Stacked generalization library: using the stack
;; Use the models and metamodels computed by make-stack
;; to make a prediction on the input-data map. Returns
;; the identifier of the prediction object.
(define (make-stack-prediction models meta-model input-data)
(let (preds (map (lambda (m) (create-prediction {"model" m
"input_data" input-data}))
models)
preds (map (lambda (p)
(head (values (get (fetch p) "prediction"))))
preds)
meta-input (make-map (model-inputs meta-model) preds))
(create-prediction {"model" meta-model "input_data" meta-input})))
The BigML Team Advanced WhizzML Workflows May 2016 20 / 34
A Stacked generalization library: auxiliary functions
;; Extract for a batchpredction its associated dataset of results
(define (batch-dataset id)
(wait-forever (get (fetch id) "output_dataset_resource")))
;; Create a batchprediction for the given model and datasets,
;; with a map of additional options and using defaults appropriate
;; for model stacking
(define (make-batch ds-id mod-id opts)
(create-batchprediction (merge {"all_fields" true
"output_dataset" true
"dataset" ds-id
"model" (wait-forever mod-id)}
{})))
;; Auxiliary function extracting the model_inputs of a model
(define (model-inputs mod-id)
(get (fetch mod-id) "input_fields"))
The BigML Team Advanced WhizzML Workflows May 2016 21 / 34
A Stacked generalization library: creating the stack
;; Splits the given dataset, using half of it to create
;; an heterogeneous collection of models and the other
;; half to train a tree that predicts based on those other
;; models predictions. Returns a map with the collection
;; of models (under the key "models") and the meta-prediction
;; as the value of the key "metamodel". The key "result"
;; has as value a boolean flag indicating whether the
;; process was successful.
(define (make-stack dataset-id)
(let (ids (split-dataset-and-wait dataset-id 0.5)
train-id (nth ids 0)
hold-id (nth ids 1)
models (create-stack-models train-id)
id (create-stack-predictions models hold-id)
orig-fields (model-inputs (head models))
obj-id (dataset-get-objective-id train-id)
meta-id (create-and-wait-model {"dataset" id
"excluded_fields" orig-fields
"objective_field" obj-id})
success? (resource-done? (fetch meta-id)))
{"models" models "metamodel" meta-id "result" success?}))
The BigML Team Advanced WhizzML Workflows May 2016 22 / 34
Library-based scripts
Script for creating the models
(define stack (make-stack dataset-id))
Script for predictions using the stack
(define (make-prediction exec-id input-data)
(let (exec (fetch exec-id)
stack (nth (head (get-in exec ["execution" "outputs"])) 1)
models (get stack "models")
metamodel (get stack "metamodel"))
(when (get stack "result")
(try (make-stack-prediction models metamodel {})
(catch e (log-info "Error: " e) false)))))
(define prediction-id (make-prediction exec-id input-data))
(define prediction (when prediction-id (fetch prediction-id)))
https://github.com/whizzml/examples/tree/master/stacked-generalizati
The BigML Team Advanced WhizzML Workflows May 2016 23 / 34
Outline
1 Introduction
2 Advanced Workflows
3 A WhizzML Implementation of Best-first Feature Selection
4 Even More Workflows!
5 Stacked Generalization in WhizzML
6 A Brief Look at Gradient Boosting in WhizzML
7 Wrapping Up
The BigML Team Advanced WhizzML Workflows May 2016 24 / 34
The Main Loop
• Given the currently predicted class probablilities, compute a
gradient step that will push those probabilities in the right direction
• Learn regression trees to represent this step over the training set
• Make a prediction with each tree
• Sum this prediction with all gradient steps so far to get a set of
scores for each point in the training data (one score for each class)
• Apply the softmax function to these sums to get a set of class
probabilities for each point.
• Iterate!
Clone it here:
https://github.com/whizzml/examples/tree/master/gradient-boosting
The BigML Team Advanced WhizzML Workflows May 2016 25 / 34
What will this look like in WhizzML?
• Several things here are machine learning operations
Constructing gradient models
Making predictions
• But several are not
Summing the gradient steps
Computing softmax probabilities
Computing gradients
• We don’t want to do those things locally (data size, resource
concerns)
• Can we do these things on BigML’s infrastructure?
The BigML Team Advanced WhizzML Workflows May 2016 26 / 34
Compute Gradients From Probabilities
• Let’s just focus on computing the gradients for a moment
• Get the predictions from the previous iteration
The sum of all of the previous gradient steps is stored in a column
If this is the first iteration, assume the uniform distribution
• Gradient for class k is just y − p(k) where y is 1 if the point’s class
is k and 0 otherwise.
The BigML Team Advanced WhizzML Workflows May 2016 27 / 34
Computing Gradients
Features Class Matrix Current Probs
0.2 10 1 0 0 0.6 0.3 0.1
0.3 12 0 1 0 0.4 0.4 0.2
0.15 10 1 0 0 0.8 0.1 0.1
0.3 -5 0 0 1 0.2 0.3 0.5
The BigML Team Advanced WhizzML Workflows May 2016 28 / 34
Computing Gradients
Features Class Matrix Current Probs Gradients
0.2 10 1 0 0 0.6 0.3 0.1 0.4 -0.3 0.1
0.3 12 0 1 0 0.4 0.4 0.2 -0.4 0.6 -0.2
0.15 10 1 0 0 0.8 0.1 0.1 0.2 -0.1 -0.1
0.3 -5 0 0 1 0.2 0.3 0.5 -0.2 -0.3 0.5
The BigML Team Advanced WhizzML Workflows May 2016 29 / 34
Aside: WhizzML + Flatline
• How can we do computations on the data?
Use Flatline: A language for data manipulation
Executed in BigML as a Dataset Transformation
https://github.com/bigmlcom/flatline/blob/master/
user-manual.md
• Benefits
Abitrary operations on the data are now API calls
Computational details are taken care of
Upload your data once, do anything to it
• Flatline is a First-class Citizen of WhizzML
The BigML Team Advanced WhizzML Workflows May 2016 30 / 34
Creating a new feature in Flatline
• We need to subtract one column value from another
• Flatline provides the f operator to get a named field value from
any row
(- (f "actual") (f "predicted"))
• But remember, if we have n classes, we also have n gradients to
construct!
• Enter WhizzML!
The BigML Team Advanced WhizzML Workflows May 2016 31 / 34
Compute Gradients: Code
(define (compute-gradient dataset nclasses iteration)
(let (next-names (grad-names nclasses iteration)
preds (if (> iteration 0)
(map (lambda (n) (flatline "(f {{n}})"))
(softmax-names nclasses iteration))
(repeat nclasses (str (/ 1 nclasses))))
tns (truth-names nclasses)
fexp (lambda (idx)
(let (actual (nth tns idx)
predicted (nth preds idx))
(flatline "(- (f {{actual}}) {predicted})")))
new-fields (make-fields next-names (map fexp (range nclasses))))
(add-fields dataset new-fields [])))
The BigML Team Advanced WhizzML Workflows May 2016 32 / 34
Outline
1 Introduction
2 Advanced Workflows
3 A WhizzML Implementation of Best-first Feature Selection
4 Even More Workflows!
5 Stacked Generalization in WhizzML
6 A Brief Look at Gradient Boosting in WhizzML
7 Wrapping Up
The BigML Team Advanced WhizzML Workflows May 2016 33 / 34
What Have We Learned?
• You can implement workflows of arbitrary complexity with
WhizzML
• The power of WhizzML with Flatline
• Editorial: The Commodification of Machine Learning Algorithms
Every language has it’s own ML algorithms now
With WhizzML, implement once and use anywhere
Never worry about architecture again
The BigML Team Advanced WhizzML Workflows May 2016 34 / 34

Weitere ähnliche Inhalte

Was ist angesagt?

ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciences
alexstorer
 
R statistics with mongo db
R statistics with mongo dbR statistics with mongo db
R statistics with mongo db
MongoDB
 

Was ist angesagt? (20)

GraphFrames Access Methods in DSE Graph
GraphFrames Access Methods in DSE GraphGraphFrames Access Methods in DSE Graph
GraphFrames Access Methods in DSE Graph
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
 
Python for R developers and data scientists
Python for R developers and data scientistsPython for R developers and data scientists
Python for R developers and data scientists
 
Ahda exploration
Ahda explorationAhda exploration
Ahda exploration
 
02 stackqueue
02 stackqueue02 stackqueue
02 stackqueue
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
 
Meet scala
Meet scalaMeet scala
Meet scala
 
Java & OOP Core Concept
Java & OOP Core ConceptJava & OOP Core Concept
Java & OOP Core Concept
 
Java 8 monads
Java 8   monadsJava 8   monads
Java 8 monads
 
GreenDao Introduction
GreenDao IntroductionGreenDao Introduction
GreenDao Introduction
 
Using xUnit as a Swiss-Aarmy Testing Toolkit
Using xUnit as a Swiss-Aarmy Testing ToolkitUsing xUnit as a Swiss-Aarmy Testing Toolkit
Using xUnit as a Swiss-Aarmy Testing Toolkit
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
 
Joining the Club: Using Spark to Accelerate Big Data at Dollar Shave Club
Joining the Club: Using Spark to Accelerate Big Data at Dollar Shave ClubJoining the Club: Using Spark to Accelerate Big Data at Dollar Shave Club
Joining the Club: Using Spark to Accelerate Big Data at Dollar Shave Club
 
The Very ^ 2 Basics of R
The Very ^ 2 Basics of RThe Very ^ 2 Basics of R
The Very ^ 2 Basics of R
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)
 
SparkSQL and Dataframe
SparkSQL and DataframeSparkSQL and Dataframe
SparkSQL and Dataframe
 
Machine Learning Model Bakeoff
Machine Learning Model BakeoffMachine Learning Model Bakeoff
Machine Learning Model Bakeoff
 
ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciences
 
Ember
EmberEmber
Ember
 
R statistics with mongo db
R statistics with mongo dbR statistics with mongo db
R statistics with mongo db
 

Andere mochten auch

Carrie Stage One Paper
Carrie Stage One PaperCarrie Stage One Paper
Carrie Stage One Paper
Angel Fox
 

Andere mochten auch (20)

portfolio
portfolioportfolio
portfolio
 
Lasticcomorecursodeldocentedenivel
LasticcomorecursodeldocentedenivelLasticcomorecursodeldocentedenivel
Lasticcomorecursodeldocentedenivel
 
CRVBreakoutGlenzinski090807
CRVBreakoutGlenzinski090807CRVBreakoutGlenzinski090807
CRVBreakoutGlenzinski090807
 
Ebaluazioa
EbaluazioaEbaluazioa
Ebaluazioa
 
Carrie Stage One Paper
Carrie Stage One PaperCarrie Stage One Paper
Carrie Stage One Paper
 
Management
ManagementManagement
Management
 
Pikler lóczy hasiberrientzat.pps
Pikler lóczy hasiberrientzat.ppsPikler lóczy hasiberrientzat.pps
Pikler lóczy hasiberrientzat.pps
 
Búsqueda avanzada en Google, Google academico y Google libros
Búsqueda avanzada en Google, Google academico y Google librosBúsqueda avanzada en Google, Google academico y Google libros
Búsqueda avanzada en Google, Google academico y Google libros
 
Uso del femenino y masculino
Uso del femenino y masculinoUso del femenino y masculino
Uso del femenino y masculino
 
презентация цыренова
презентация цыреновапрезентация цыренова
презентация цыренова
 
Búsqueda avanzada en google
Búsqueda avanzada  en googleBúsqueda avanzada  en google
Búsqueda avanzada en google
 
Utilizando scoop.it
Utilizando scoop.itUtilizando scoop.it
Utilizando scoop.it
 
LinkedInResume
LinkedInResumeLinkedInResume
LinkedInResume
 
Uso del femenino y masculino
Uso del femenino y masculinoUso del femenino y masculino
Uso del femenino y masculino
 
Diapositivas de la tecnologia de la comunicacion
Diapositivas  de la tecnologia de la comunicacionDiapositivas  de la tecnologia de la comunicacion
Diapositivas de la tecnologia de la comunicacion
 
All About Your Senses
All About Your SensesAll About Your Senses
All About Your Senses
 
Hábitos deportivos
Hábitos deportivosHábitos deportivos
Hábitos deportivos
 
CFP Issue 3
CFP Issue 3CFP Issue 3
CFP Issue 3
 
Tomate
TomateTomate
Tomate
 
Get shop.tv шариф кармо 2016
Get shop.tv  шариф кармо 2016Get shop.tv  шариф кармо 2016
Get shop.tv шариф кармо 2016
 

Ähnlich wie Advanced WhizzML Workflows

(Py)testing the Limits of Machine Learning
(Py)testing the Limits of Machine Learning(Py)testing the Limits of Machine Learning
(Py)testing the Limits of Machine Learning
Rebecca Bilbro
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
Databricks
 

Ähnlich wie Advanced WhizzML Workflows (20)

VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
 
Basic WhizzML Workflows
Basic WhizzML WorkflowsBasic WhizzML Workflows
Basic WhizzML Workflows
 
BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
 
Data science and OSS
Data science and OSSData science and OSS
Data science and OSS
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ Fyber
 
VSSML16 L7. REST API, Bindings, and Basic Workflows
VSSML16 L7. REST API, Bindings, and Basic WorkflowsVSSML16 L7. REST API, Bindings, and Basic Workflows
VSSML16 L7. REST API, Bindings, and Basic Workflows
 
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine LearningWeekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
 
Practical Predictive Modeling in Python
Practical Predictive Modeling in PythonPractical Predictive Modeling in Python
Practical Predictive Modeling in Python
 
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
 
Foundations for Scaling ML in Apache Spark
Foundations for Scaling ML in Apache SparkFoundations for Scaling ML in Apache Spark
Foundations for Scaling ML in Apache Spark
 
DutchMLSchool. ML Automation
DutchMLSchool. ML AutomationDutchMLSchool. ML Automation
DutchMLSchool. ML Automation
 
(Py)testing the Limits of Machine Learning
(Py)testing the Limits of Machine Learning(Py)testing the Limits of Machine Learning
(Py)testing the Limits of Machine Learning
 
JavaScript Miller Columns
JavaScript Miller ColumnsJavaScript Miller Columns
JavaScript Miller Columns
 
Deep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps WorkflowsDeep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps Workflows
 
Useful practices of creation automatic tests by using cucumber jvm
Useful practices of creation automatic tests by using cucumber jvmUseful practices of creation automatic tests by using cucumber jvm
Useful practices of creation automatic tests by using cucumber jvm
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
 
AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016
 

Kürzlich hochgeladen

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Kürzlich hochgeladen (20)

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 

Advanced WhizzML Workflows

  • 1. Advanced WhizzML Workflows The BigML Team May 2016 The BigML Team Advanced WhizzML Workflows May 2016 1 / 34
  • 2. Outline 1 Introduction 2 Advanced Workflows 3 A WhizzML Implementation of Best-first Feature Selection 4 Even More Workflows! 5 Stacked Generalization in WhizzML 6 A Brief Look at Gradient Boosting in WhizzML 7 Wrapping Up The BigML Team Advanced WhizzML Workflows May 2016 2 / 34
  • 3. Outline 1 Introduction 2 Advanced Workflows 3 A WhizzML Implementation of Best-first Feature Selection 4 Even More Workflows! 5 Stacked Generalization in WhizzML 6 A Brief Look at Gradient Boosting in WhizzML 7 Wrapping Up The BigML Team Advanced WhizzML Workflows May 2016 3 / 34
  • 4. What Do We Know About WhizzML? • It’s a complete programming language • Machine learning “operations” are first-class • Those operations are performed in BigML’s backend One-line of code to perform API requests We get scale “for free” • Everything is Composable Functions Libraries The Web Interface The BigML Team Advanced WhizzML Workflows May 2016 4 / 34
  • 5. What Can We Do With It? • Non-trivial Model Selection n-fold cross validation Comparison of model types (tree, ensemble, logistic) • Automation of Drudgery One-click retraining/validation Standarized dataset transformations / cleaning • Sure, but what else? The BigML Team Advanced WhizzML Workflows May 2016 5 / 34
  • 6. Outline 1 Introduction 2 Advanced Workflows 3 A WhizzML Implementation of Best-first Feature Selection 4 Even More Workflows! 5 Stacked Generalization in WhizzML 6 A Brief Look at Gradient Boosting in WhizzML 7 Wrapping Up The BigML Team Advanced WhizzML Workflows May 2016 6 / 34
  • 7. Algorithms as Workflows • Many ML algorithms can be thought of as workflows • In these algorithms, machine learning operations are the primitives Make a model Make a prediction Evaluate a model • Many such algorithms can be implemented in WhizzML Reap the advantages of BigML’s infrastructure Once implemented, it is language-agnostic The BigML Team Advanced WhizzML Workflows May 2016 7 / 34
  • 8. Examples: Best-first Feature Selection Objective: Select the n best features for modeling your data • Initialize a set S of used features as the empty set • Split your dataset into training and test sets • For i in 1 . . . n For each feature f not in S, model and evaluate with feature set S + f Greedily select ˆf, the feature with the best performance and set S ← S + ˆf https://github.com/whizzml/examples/tree/master/best-first The BigML Team Advanced WhizzML Workflows May 2016 8 / 34
  • 9. Outline 1 Introduction 2 Advanced Workflows 3 A WhizzML Implementation of Best-first Feature Selection 4 Even More Workflows! 5 Stacked Generalization in WhizzML 6 A Brief Look at Gradient Boosting in WhizzML 7 Wrapping Up The BigML Team Advanced WhizzML Workflows May 2016 9 / 34
  • 10. Modeling First, construct a bunch of models. selected is the features that have already been selected, and potentials are the candidates we might select on this iteration. (define (make-models dataset-id obj-field selected potentials) (let (model-req {"dataset" dataset-id "objective_field" obj-field} make-req (lambda (fid) (assoc model-req "input_fields" (cons fid selected))) all-reqs (map make-req potentials)) (create-and-wait* "model" all-reqs))) The BigML Team Advanced WhizzML Workflows May 2016 10 / 34
  • 11. Evaluation Now, conduct the evaluations. potentials is again the list of potential features to add, and model-ids is the list of corresponding model-ids created in the last step. (define (select-feature test-dataset-id potentials model-ids) (let (eval-req {"dataset" test-dataset-id} make-req (lambda (mid) (assoc eval-req "model" mid)) all-reqs (map make-req model-ids) evs (map fetch (create-and-wait* "evaluation" all-reqs)) vs (map (lambda (ev) (get-in ev ["result" "model" "average_phi"])) evs) value-map (make-map potentials vs) ;; e.g, {"000000" 0.8 "0000001" 0.7} max-val (get-max vs) choose-best (lambda (id) (if (= max-val (get value-map id)) id false))) (some choose-best potentials))) The BigML Team Advanced WhizzML Workflows May 2016 11 / 34
  • 12. Main Loop The main loop of the algorithm. Set up your objective id, inputs, and training and test dataset. Initialize the selected features to the empty set and iteratively call the previous two functions. (define (select-features dataset-id nfeatures) (let (obj-id (dataset-get-objective-id dataset-id) input-ids (default-inputs dataset-id obj-id) splits (split-dataset dataset-id 0.5) train-id (nth splits 0) test-id (nth splits 1)) (loop (selected [] potentials input-ids) (if (or (>= (count selected) nfeatures) (empty? potentials)) (feature-names dataset-id selected) (let (model-ids (make-models dataset-id obj-id selected potentials) next-feat (select-feature test-id potentials model-ids)) (recur (cons next-feat selected) (filter (lambda (id) (not (= id next-feat))) potentials))))))) The BigML Team Advanced WhizzML Workflows May 2016 12 / 34
  • 13. Outline 1 Introduction 2 Advanced Workflows 3 A WhizzML Implementation of Best-first Feature Selection 4 Even More Workflows! 5 Stacked Generalization in WhizzML 6 A Brief Look at Gradient Boosting in WhizzML 7 Wrapping Up The BigML Team Advanced WhizzML Workflows May 2016 13 / 34
  • 14. Examples: Stacked Generalization Objective: Improve predictions by modeling the output scores of multiple trained models. • Create a training and a holdout set • Create n different models on the training set (with some difference among them; e.g., single-tree vs. ensemble vs. logistic regression) • Make predictions from those models on the holdout set • Train a model to predict the class based on the other models’ predictions The BigML Team Advanced WhizzML Workflows May 2016 14 / 34
  • 15. Examples: Randomized Parameter Optimization Objective: Find the best set of parameters for a machine learning algorithm • Do: Generate a random set of parameters for an ML algorithm Do 10-fold cross-validation with those parameters • Until you get a set of parameters that performs “well” or you get bored The BigML Team Advanced WhizzML Workflows May 2016 15 / 34
  • 16. Examples: SMACdown Objective: Find the best set of parameters even more quickly! • Do: Generate several random sets of parameters for an ML algorithm Do 10-fold cross-validation with those parameters Learn a predictive model to predict performance from parameter values Use the model to help you select the next set of parameters to evaluate • Until you get a set of parameters that performs “well” or you get bored Coming soon to a WhizzML gallery near you! The BigML Team Advanced WhizzML Workflows May 2016 16 / 34
  • 17. Examples: Boosting • General idea: Iteratively model the dataset Each iteration is trained on the mistakes of previous iterations Said another way, the objective changes each iteration The final model is a summation of all iterations • Lots of variations on this theme Adaboost Logitboost Martingale Boosting Gradient Boosting • Let’s take a look at a WhizzML implementation of the latter The BigML Team Advanced WhizzML Workflows May 2016 17 / 34
  • 18. Outline 1 Introduction 2 Advanced Workflows 3 A WhizzML Implementation of Best-first Feature Selection 4 Even More Workflows! 5 Stacked Generalization in WhizzML 6 A Brief Look at Gradient Boosting in WhizzML 7 Wrapping Up The BigML Team Advanced WhizzML Workflows May 2016 18 / 34
  • 19. A Stacked generalization library: creating the stack ;; Splits the given dataset, using half of it to create ;; an heterogeneous collection of models and the other ;; half to train a tree that predicts based on those other ;; models predictions. Returns a map with the collection ;; of models (under the key "models") and the meta-prediction ;; as the value of the key "metamodel". The key "result" ;; has as value a boolean flag indicating whether the ;; process was successful. (define (make-stack dataset-id) (let (ids (split-dataset-and-wait dataset-id 0.5) train-id (nth ids 0) hold-id (nth ids 1) models (create-stack-models train-id) id (create-stack-predictions models hold-id) orig-fields (model-inputs (head models)) obj-id (dataset-get-objective-id train-id) meta-id (create-and-wait-model {"dataset" id "excluded_fields" orig-fields "objective_field" obj-id}) success? (resource-done? (fetch meta-id))) {"models" models "metamodel" meta-id "result" success?})) The BigML Team Advanced WhizzML Workflows May 2016 19 / 34
  • 20. A Stacked generalization library: using the stack ;; Use the models and metamodels computed by make-stack ;; to make a prediction on the input-data map. Returns ;; the identifier of the prediction object. (define (make-stack-prediction models meta-model input-data) (let (preds (map (lambda (m) (create-prediction {"model" m "input_data" input-data})) models) preds (map (lambda (p) (head (values (get (fetch p) "prediction")))) preds) meta-input (make-map (model-inputs meta-model) preds)) (create-prediction {"model" meta-model "input_data" meta-input}))) The BigML Team Advanced WhizzML Workflows May 2016 20 / 34
  • 21. A Stacked generalization library: auxiliary functions ;; Extract for a batchpredction its associated dataset of results (define (batch-dataset id) (wait-forever (get (fetch id) "output_dataset_resource"))) ;; Create a batchprediction for the given model and datasets, ;; with a map of additional options and using defaults appropriate ;; for model stacking (define (make-batch ds-id mod-id opts) (create-batchprediction (merge {"all_fields" true "output_dataset" true "dataset" ds-id "model" (wait-forever mod-id)} {}))) ;; Auxiliary function extracting the model_inputs of a model (define (model-inputs mod-id) (get (fetch mod-id) "input_fields")) The BigML Team Advanced WhizzML Workflows May 2016 21 / 34
  • 22. A Stacked generalization library: creating the stack ;; Splits the given dataset, using half of it to create ;; an heterogeneous collection of models and the other ;; half to train a tree that predicts based on those other ;; models predictions. Returns a map with the collection ;; of models (under the key "models") and the meta-prediction ;; as the value of the key "metamodel". The key "result" ;; has as value a boolean flag indicating whether the ;; process was successful. (define (make-stack dataset-id) (let (ids (split-dataset-and-wait dataset-id 0.5) train-id (nth ids 0) hold-id (nth ids 1) models (create-stack-models train-id) id (create-stack-predictions models hold-id) orig-fields (model-inputs (head models)) obj-id (dataset-get-objective-id train-id) meta-id (create-and-wait-model {"dataset" id "excluded_fields" orig-fields "objective_field" obj-id}) success? (resource-done? (fetch meta-id))) {"models" models "metamodel" meta-id "result" success?})) The BigML Team Advanced WhizzML Workflows May 2016 22 / 34
  • 23. Library-based scripts Script for creating the models (define stack (make-stack dataset-id)) Script for predictions using the stack (define (make-prediction exec-id input-data) (let (exec (fetch exec-id) stack (nth (head (get-in exec ["execution" "outputs"])) 1) models (get stack "models") metamodel (get stack "metamodel")) (when (get stack "result") (try (make-stack-prediction models metamodel {}) (catch e (log-info "Error: " e) false))))) (define prediction-id (make-prediction exec-id input-data)) (define prediction (when prediction-id (fetch prediction-id))) https://github.com/whizzml/examples/tree/master/stacked-generalizati The BigML Team Advanced WhizzML Workflows May 2016 23 / 34
  • 24. Outline 1 Introduction 2 Advanced Workflows 3 A WhizzML Implementation of Best-first Feature Selection 4 Even More Workflows! 5 Stacked Generalization in WhizzML 6 A Brief Look at Gradient Boosting in WhizzML 7 Wrapping Up The BigML Team Advanced WhizzML Workflows May 2016 24 / 34
  • 25. The Main Loop • Given the currently predicted class probablilities, compute a gradient step that will push those probabilities in the right direction • Learn regression trees to represent this step over the training set • Make a prediction with each tree • Sum this prediction with all gradient steps so far to get a set of scores for each point in the training data (one score for each class) • Apply the softmax function to these sums to get a set of class probabilities for each point. • Iterate! Clone it here: https://github.com/whizzml/examples/tree/master/gradient-boosting The BigML Team Advanced WhizzML Workflows May 2016 25 / 34
  • 26. What will this look like in WhizzML? • Several things here are machine learning operations Constructing gradient models Making predictions • But several are not Summing the gradient steps Computing softmax probabilities Computing gradients • We don’t want to do those things locally (data size, resource concerns) • Can we do these things on BigML’s infrastructure? The BigML Team Advanced WhizzML Workflows May 2016 26 / 34
  • 27. Compute Gradients From Probabilities • Let’s just focus on computing the gradients for a moment • Get the predictions from the previous iteration The sum of all of the previous gradient steps is stored in a column If this is the first iteration, assume the uniform distribution • Gradient for class k is just y − p(k) where y is 1 if the point’s class is k and 0 otherwise. The BigML Team Advanced WhizzML Workflows May 2016 27 / 34
  • 28. Computing Gradients Features Class Matrix Current Probs 0.2 10 1 0 0 0.6 0.3 0.1 0.3 12 0 1 0 0.4 0.4 0.2 0.15 10 1 0 0 0.8 0.1 0.1 0.3 -5 0 0 1 0.2 0.3 0.5 The BigML Team Advanced WhizzML Workflows May 2016 28 / 34
  • 29. Computing Gradients Features Class Matrix Current Probs Gradients 0.2 10 1 0 0 0.6 0.3 0.1 0.4 -0.3 0.1 0.3 12 0 1 0 0.4 0.4 0.2 -0.4 0.6 -0.2 0.15 10 1 0 0 0.8 0.1 0.1 0.2 -0.1 -0.1 0.3 -5 0 0 1 0.2 0.3 0.5 -0.2 -0.3 0.5 The BigML Team Advanced WhizzML Workflows May 2016 29 / 34
  • 30. Aside: WhizzML + Flatline • How can we do computations on the data? Use Flatline: A language for data manipulation Executed in BigML as a Dataset Transformation https://github.com/bigmlcom/flatline/blob/master/ user-manual.md • Benefits Abitrary operations on the data are now API calls Computational details are taken care of Upload your data once, do anything to it • Flatline is a First-class Citizen of WhizzML The BigML Team Advanced WhizzML Workflows May 2016 30 / 34
  • 31. Creating a new feature in Flatline • We need to subtract one column value from another • Flatline provides the f operator to get a named field value from any row (- (f "actual") (f "predicted")) • But remember, if we have n classes, we also have n gradients to construct! • Enter WhizzML! The BigML Team Advanced WhizzML Workflows May 2016 31 / 34
  • 32. Compute Gradients: Code (define (compute-gradient dataset nclasses iteration) (let (next-names (grad-names nclasses iteration) preds (if (> iteration 0) (map (lambda (n) (flatline "(f {{n}})")) (softmax-names nclasses iteration)) (repeat nclasses (str (/ 1 nclasses)))) tns (truth-names nclasses) fexp (lambda (idx) (let (actual (nth tns idx) predicted (nth preds idx)) (flatline "(- (f {{actual}}) {predicted})"))) new-fields (make-fields next-names (map fexp (range nclasses)))) (add-fields dataset new-fields []))) The BigML Team Advanced WhizzML Workflows May 2016 32 / 34
  • 33. Outline 1 Introduction 2 Advanced Workflows 3 A WhizzML Implementation of Best-first Feature Selection 4 Even More Workflows! 5 Stacked Generalization in WhizzML 6 A Brief Look at Gradient Boosting in WhizzML 7 Wrapping Up The BigML Team Advanced WhizzML Workflows May 2016 33 / 34
  • 34. What Have We Learned? • You can implement workflows of arbitrary complexity with WhizzML • The power of WhizzML with Flatline • Editorial: The Commodification of Machine Learning Algorithms Every language has it’s own ML algorithms now With WhizzML, implement once and use anywhere Never worry about architecture again The BigML Team Advanced WhizzML Workflows May 2016 34 / 34