Vowpal Platypus is a general use, lightweight Python wrapper built on Vowpal Wabbit, that uses online learning to achieve great results. https://github.com/peterhurford/vowpal_platypus
3. WE OFTEN WANT TO PREDICT STUFF…
...BUT WE RUN INTO LIMITATIONS.
4. WE OFTEN WANT TO PREDICT STUFF…
...BUT WE RUN INTO LIMITATIONS.
× ...Data set is too large, it doesn’t fit in RAM.
5. WE OFTEN WANT TO PREDICT STUFF…
...BUT WE RUN INTO LIMITATIONS.
× ...Data set is too large, it doesn’t fit in RAM.
× ...Data set is so large, it doesn’t fit on disk!
6. WE OFTEN WANT TO PREDICT STUFF…
...BUT WE RUN INTO LIMITATIONS.
× ...Data set is too large, it doesn’t fit in RAM.
× ...Data set is so large, it doesn’t fit on disk!
× ...Model train time is so slow, you can’t iterate
and try things.
7. “I want to use parallel
learning algorithms to
create fantastic learning
machines!”
- John Langford, 1997
8. YOU FOOL! THE ONLY
THING PARALLEL
MACHINES ARE USEFUL
FOR ARE COMPUTATIONAL
WINDTUNNELS!
15. Traditional Approach
1. Load all training data
into RAM at once.
2. Fit model to training
dataset.
3. Load all predicting data
into RAM at once.
4. Use trained model to
make predictions.
WHAT DOES IT DO?
16. VW “Online” Approach
1. Train model on single
datapoints, one at a
time.
2. Do it again multiple
times.
3. Use trained model to
predict on new
datapoints, one at a
time.
Traditional Approach
1. Load all training data
into RAM at once.
2. Fit model to training
dataset.
3. Load all predicting data
into RAM at once.
4. Use trained model to
make predictions.
WHAT DOES IT DO?
17. × Online approach
eventually converges to
the same results as a
traditional (batch)
approach over enough
iterations.
WHAT DOES IT DO?
18. WHAT DOES IT DO?
× Online approach
eventually converges to
the same results as a
traditional (batch)
approach over enough
iterations.
× But you’re no longer
dependent on RAM!
19. Kaggle: World Data Science Competitions
× 3rd, 14th, and 29th / 718 on $16K Criteo ad click challenge
× 3rd / 472 on $2K KDD Cup Challenge
× 8th / 128 on $25K Avito.ru illicit content filtering challenge
IS IT ANY GOOD?
20. × szilard/benchm-ml: widely cited (1127 star) independent ML
speed benchmarks.
× Logistic Regression on 10M datapoints on a c3.8xlarge instance
(32 cores, 60GB RAM).
DID I MENTION IT’S FAST?
Engine Speed
Python Sklearn Crashed
R 90sec
Vowpal Wabbit 15sec
Spark 35sec
21. × szilard/benchm-ml: widely cited (1127 star) independent ML
speed benchmarks.
× Logistic Regression on 10M datapoints on a c3.8xlarge instance
(32 cores, 60GB RAM).
DID I MENTION IT’S FAST?
Engine Speed
Python Sklearn Crashed
R 90sec
Vowpal Wabbit 15sec
Spark 35sec
Yes, this was Spark 2.0, but it
was using MLLib. ML
performance is under testing
now.
22. × szilard/benchm-ml: widely cited (1127 star) independent ML
speed benchmarks.
× Logistic Regression on 10M datapoints on a c3.8xlarge instance
(32 cores, 60GB RAM).
DID I MENTION IT’S FAST?
Engine Speed
Python Sklearn Crashed
R 90sec
Vowpal Wabbit 15sec
Spark 35sec
But this benchmark was
only single core!
23. × szilard/benchm-ml: widely cited (1127 star) independent ML
speed benchmarks.
× Logistic Regression on 10M datapoints on a c3.8xlarge instance
(32 cores, 60GB RAM).
DID I MENTION IT’S FAST?
Engine Speed
Python Sklearn Crashed
R 90sec
Vowpal Wabbit 15sec
Spark 35sec
...and none of the
benchmarks include
data load time! (VP has
none.)
25. WHAT IS VOWPAL PLATYPUS?
× An open source vehicle for productionizing
Vowpal Wabbit in Python.
26. WHAT IS VOWPAL PLATYPUS?
× An open source vehicle for productionizing
Vowpal Wabbit in Python.
× Train and predict on Python dictionaries
instead of the obscure VW format.
27. WHAT IS VOWPAL PLATYPUS?
× An open source vehicle for productionizing
Vowpal Wabbit in Python.
× Train and predict on Python dictionaries
instead of the obscure VW format.
× Easily use VW’s parallel features to go
multicore and multi-machine.
28. WHAT IS VOWPAL PLATYPUS?
× An open source vehicle for productionizing
Vowpal Wabbit in Python.
× Train and predict on Python dictionaries
instead of the obscure VW format.
× Easily use VW’s parallel features to go
multicore and multi-machine.
VW has been used on
“terascale datasets, with
trillions of features,
billions of training
examples and millions of
parameters in an hour
using a cluster of 1000
machines.”
29. WHAT IS VOWPAL PLATYPUS?
× An open source vehicle for productionizing
Vowpal Wabbit in Python.
× Train and predict on Python dictionaries
instead of the obscure VW format.
× Easily use VW’s parallel features to go
multicore and multi-machine.
...so far VP has only
been used on a
maximum of 3 machines
(combined 108 core),
but we’re getting there...
39. dEMo #2!27,279 MOVIES & 138,494 users
21m47s
3,757,977,826PReDICTIONS...need to be made.
Total runtime on
3x c4.8xlarge
(108 cores total)
342nanoseconds per prediction
(wall clock time)