SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
Beware… For It’S THE...
Vowpal platypus
Peter HurforD
(With a little help from some friends)
WE OFTEN WANT TO PREDICT STUFF...
WE OFTEN WANT TO PREDICT STUFF…
...BUT WE RUN INTO LIMITATIONS.
WE OFTEN WANT TO PREDICT STUFF…
...BUT WE RUN INTO LIMITATIONS.
× ...Data set is too large, it doesn’t fit in RAM.
WE OFTEN WANT TO PREDICT STUFF…
...BUT WE RUN INTO LIMITATIONS.
× ...Data set is too large, it doesn’t fit in RAM.
× ...Data set is so large, it doesn’t fit on disk!
WE OFTEN WANT TO PREDICT STUFF…
...BUT WE RUN INTO LIMITATIONS.
× ...Data set is too large, it doesn’t fit in RAM.
× ...Data set is so large, it doesn’t fit on disk!
× ...Model train time is so slow, you can’t iterate
and try things.
“I want to use parallel
learning algorithms to
create fantastic learning
machines!”
- John Langford, 1997
YOU FOOL! THE ONLY
THING PARALLEL
MACHINES ARE USEFUL
FOR ARE COMPUTATIONAL
WINDTUNNELS!
TEN YEARS LATER...
VOWPAL
...Fast Online Learning
TEN YEARS LATER...
...WHAT’s WITH THE NAME?
...WHAT’s WITH THE NAME?
...WHAT’s WITH THE NAME?
+
...WHAT’s WITH THE NAME?
+
Traditional Approach
1. Load all training data
into RAM at once.
2. Fit model to training
dataset.
3. Load all predicting data
into RAM at once.
4. Use trained model to
make predictions.
WHAT DOES IT DO?
VW “Online” Approach
1. Train model on single
datapoints, one at a
time.
2. Do it again multiple
times.
3. Use trained model to
predict on new
datapoints, one at a
time.
Traditional Approach
1. Load all training data
into RAM at once.
2. Fit model to training
dataset.
3. Load all predicting data
into RAM at once.
4. Use trained model to
make predictions.
WHAT DOES IT DO?
× Online approach
eventually converges to
the same results as a
traditional (batch)
approach over enough
iterations.
WHAT DOES IT DO?
WHAT DOES IT DO?
× Online approach
eventually converges to
the same results as a
traditional (batch)
approach over enough
iterations.
× But you’re no longer
dependent on RAM!
Kaggle: World Data Science Competitions
× 3rd, 14th, and 29th / 718 on $16K Criteo ad click challenge
× 3rd / 472 on $2K KDD Cup Challenge
× 8th / 128 on $25K Avito.ru illicit content filtering challenge
IS IT ANY GOOD?
× szilard/benchm-ml: widely cited (1127 star) independent ML
speed benchmarks.
× Logistic Regression on 10M datapoints on a c3.8xlarge instance
(32 cores, 60GB RAM).
DID I MENTION IT’S FAST?
Engine Speed
Python Sklearn Crashed
R 90sec
Vowpal Wabbit 15sec
Spark 35sec
× szilard/benchm-ml: widely cited (1127 star) independent ML
speed benchmarks.
× Logistic Regression on 10M datapoints on a c3.8xlarge instance
(32 cores, 60GB RAM).
DID I MENTION IT’S FAST?
Engine Speed
Python Sklearn Crashed
R 90sec
Vowpal Wabbit 15sec
Spark 35sec
Yes, this was Spark 2.0, but it
was using MLLib. ML
performance is under testing
now.
× szilard/benchm-ml: widely cited (1127 star) independent ML
speed benchmarks.
× Logistic Regression on 10M datapoints on a c3.8xlarge instance
(32 cores, 60GB RAM).
DID I MENTION IT’S FAST?
Engine Speed
Python Sklearn Crashed
R 90sec
Vowpal Wabbit 15sec
Spark 35sec
But this benchmark was
only single core!
× szilard/benchm-ml: widely cited (1127 star) independent ML
speed benchmarks.
× Logistic Regression on 10M datapoints on a c3.8xlarge instance
(32 cores, 60GB RAM).
DID I MENTION IT’S FAST?
Engine Speed
Python Sklearn Crashed
R 90sec
Vowpal Wabbit 15sec
Spark 35sec
...and none of the
benchmarks include
data load time! (VP has
none.)
...But what’s THIS
ABOUT A PLATYPUS?
WHAT IS VOWPAL PLATYPUS?
× An open source vehicle for productionizing
Vowpal Wabbit in Python.
WHAT IS VOWPAL PLATYPUS?
× An open source vehicle for productionizing
Vowpal Wabbit in Python.
× Train and predict on Python dictionaries
instead of the obscure VW format.
WHAT IS VOWPAL PLATYPUS?
× An open source vehicle for productionizing
Vowpal Wabbit in Python.
× Train and predict on Python dictionaries
instead of the obscure VW format.
× Easily use VW’s parallel features to go
multicore and multi-machine.
WHAT IS VOWPAL PLATYPUS?
× An open source vehicle for productionizing
Vowpal Wabbit in Python.
× Train and predict on Python dictionaries
instead of the obscure VW format.
× Easily use VW’s parallel features to go
multicore and multi-machine.
VW has been used on
“terascale datasets, with
trillions of features,
billions of training
examples and millions of
parameters in an hour
using a cluster of 1000
machines.”
WHAT IS VOWPAL PLATYPUS?
× An open source vehicle for productionizing
Vowpal Wabbit in Python.
× Train and predict on Python dictionaries
instead of the obscure VW format.
× Easily use VW’s parallel features to go
multicore and multi-machine.
...so far VP has only
been used on a
maximum of 3 machines
(combined 108 core),
but we’re getting there...
dEMo #1!
dEMo #2!
dEMo #2!
dEMo #2!27,279 MOVIES & 138,494 users
dEMo #2!27,279 MOVIES & 138,494 users
3,757,977,826PReDICTIONS...need to be made.
dEMo #2!27,279 MOVIES & 138,494 users
21m47s
3,757,977,826PReDICTIONS...need to be made.
Total runtime on
3x c4.8xlarge
(108 cores total)
342nanoseconds per prediction
(wall clock time)
THE END! (...OR IS IT?)

Weitere ähnliche Inhalte

Andere mochten auch

Advanced Python : Static and Class Methods
Advanced Python : Static and Class Methods Advanced Python : Static and Class Methods
Advanced Python : Static and Class Methods Bhanwar Singh Meena
 
Object Oriented Programming in Python
Object Oriented Programming in PythonObject Oriented Programming in Python
Object Oriented Programming in PythonSujith Kumar
 
Advance OOP concepts in Python
Advance OOP concepts in PythonAdvance OOP concepts in Python
Advance OOP concepts in PythonSujith Kumar
 
Basics of Object Oriented Programming in Python
Basics of Object Oriented Programming in PythonBasics of Object Oriented Programming in Python
Basics of Object Oriented Programming in PythonSujith Kumar
 
Python Tricks That You Can't Live Without
Python Tricks That You Can't Live WithoutPython Tricks That You Can't Live Without
Python Tricks That You Can't Live WithoutAudrey Roy
 
Prepping the Analytics organization for Artificial Intelligence evolution
Prepping the Analytics organization for Artificial Intelligence evolutionPrepping the Analytics organization for Artificial Intelligence evolution
Prepping the Analytics organization for Artificial Intelligence evolutionRamkumar Ravichandran
 
Python 101: Python for Absolute Beginners (PyTexas 2014)
Python 101: Python for Absolute Beginners (PyTexas 2014)Python 101: Python for Absolute Beginners (PyTexas 2014)
Python 101: Python for Absolute Beginners (PyTexas 2014)Paige Bailey
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsRoelof Pieters
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceLukas Masuch
 
Learn 90% of Python in 90 Minutes
Learn 90% of Python in 90 MinutesLearn 90% of Python in 90 Minutes
Learn 90% of Python in 90 MinutesMatt Harrison
 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to PythonNowell Strite
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through ExamplesSri Ambati
 

Andere mochten auch (13)

Advanced Python : Static and Class Methods
Advanced Python : Static and Class Methods Advanced Python : Static and Class Methods
Advanced Python : Static and Class Methods
 
Object Oriented Programming in Python
Object Oriented Programming in PythonObject Oriented Programming in Python
Object Oriented Programming in Python
 
Advance OOP concepts in Python
Advance OOP concepts in PythonAdvance OOP concepts in Python
Advance OOP concepts in Python
 
Basics of Object Oriented Programming in Python
Basics of Object Oriented Programming in PythonBasics of Object Oriented Programming in Python
Basics of Object Oriented Programming in Python
 
Python Tricks That You Can't Live Without
Python Tricks That You Can't Live WithoutPython Tricks That You Can't Live Without
Python Tricks That You Can't Live Without
 
Prepping the Analytics organization for Artificial Intelligence evolution
Prepping the Analytics organization for Artificial Intelligence evolutionPrepping the Analytics organization for Artificial Intelligence evolution
Prepping the Analytics organization for Artificial Intelligence evolution
 
Python 101: Python for Absolute Beginners (PyTexas 2014)
Python 101: Python for Absolute Beginners (PyTexas 2014)Python 101: Python for Absolute Beginners (PyTexas 2014)
Python 101: Python for Absolute Beginners (PyTexas 2014)
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
 
Python Worst Practices
Python Worst PracticesPython Worst Practices
Python Worst Practices
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
 
Learn 90% of Python in 90 Minutes
Learn 90% of Python in 90 MinutesLearn 90% of Python in 90 Minutes
Learn 90% of Python in 90 Minutes
 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to Python
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 

Ähnlich wie Fast Online Machine Learning with Vowpal Wabbit and Python

Spark Gotchas and Lessons Learned (2/20/20)
Spark Gotchas and Lessons Learned (2/20/20)Spark Gotchas and Lessons Learned (2/20/20)
Spark Gotchas and Lessons Learned (2/20/20)Jen Waller
 
MongoDB & Machine Learning
MongoDB & Machine LearningMongoDB & Machine Learning
MongoDB & Machine LearningTom Maiaroto
 
The computer science behind a modern disributed data store
The computer science behind a modern disributed data storeThe computer science behind a modern disributed data store
The computer science behind a modern disributed data storeJ On The Beach
 
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...NETWAYS
 
The Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed DatabaseThe Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed DatabaseArangoDB Database
 
Dear compiler please don't be my nanny v2
Dear compiler  please don't be my nanny v2Dear compiler  please don't be my nanny v2
Dear compiler please don't be my nanny v2Dino Dini
 
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Distributed machine learning 101 using apache spark from a browser   devoxx.b...Distributed machine learning 101 using apache spark from a browser   devoxx.b...
Distributed machine learning 101 using apache spark from a browser devoxx.b...Andy Petrella
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsDomino Data Lab
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++Mike Acton
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Databricks
 
Metasepi team meeting #16: Safety on ATS language + MCU
Metasepi team meeting #16: Safety on ATS language + MCUMetasepi team meeting #16: Safety on ATS language + MCU
Metasepi team meeting #16: Safety on ATS language + MCUKiwamu Okabe
 
Sparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With SparkSparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With SparkIan Pointer
 
Lessons I Learned While Scaling to 5000 Puppet Agents
Lessons I Learned While Scaling to 5000 Puppet AgentsLessons I Learned While Scaling to 5000 Puppet Agents
Lessons I Learned While Scaling to 5000 Puppet AgentsPuppet
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and OutTravis Oliphant
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Antti Haapala
 
Beat the devil: towards a Drupal performance benchmark
Beat the devil: towards a Drupal performance benchmarkBeat the devil: towards a Drupal performance benchmark
Beat the devil: towards a Drupal performance benchmarkPedro González Serrano
 
The Right Data for the Right Job
The Right Data for the Right JobThe Right Data for the Right Job
The Right Data for the Right JobEmily Curtin
 

Ähnlich wie Fast Online Machine Learning with Vowpal Wabbit and Python (20)

Spark Gotchas and Lessons Learned (2/20/20)
Spark Gotchas and Lessons Learned (2/20/20)Spark Gotchas and Lessons Learned (2/20/20)
Spark Gotchas and Lessons Learned (2/20/20)
 
MongoDB & Machine Learning
MongoDB & Machine LearningMongoDB & Machine Learning
MongoDB & Machine Learning
 
The computer science behind a modern disributed data store
The computer science behind a modern disributed data storeThe computer science behind a modern disributed data store
The computer science behind a modern disributed data store
 
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
 
Aug 2012 HUG: Hug BigTop
Aug 2012 HUG: Hug BigTopAug 2012 HUG: Hug BigTop
Aug 2012 HUG: Hug BigTop
 
The Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed DatabaseThe Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed Database
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Dear compiler please don't be my nanny v2
Dear compiler  please don't be my nanny v2Dear compiler  please don't be my nanny v2
Dear compiler please don't be my nanny v2
 
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Distributed machine learning 101 using apache spark from a browser   devoxx.b...Distributed machine learning 101 using apache spark from a browser   devoxx.b...
Distributed machine learning 101 using apache spark from a browser devoxx.b...
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
 
Metasepi team meeting #16: Safety on ATS language + MCU
Metasepi team meeting #16: Safety on ATS language + MCUMetasepi team meeting #16: Safety on ATS language + MCU
Metasepi team meeting #16: Safety on ATS language + MCU
 
Sparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With SparkSparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With Spark
 
Lessons I Learned While Scaling to 5000 Puppet Agents
Lessons I Learned While Scaling to 5000 Puppet AgentsLessons I Learned While Scaling to 5000 Puppet Agents
Lessons I Learned While Scaling to 5000 Puppet Agents
 
Data science tutorial
Data science tutorialData science tutorial
Data science tutorial
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit
 
Beat the devil: towards a Drupal performance benchmark
Beat the devil: towards a Drupal performance benchmarkBeat the devil: towards a Drupal performance benchmark
Beat the devil: towards a Drupal performance benchmark
 
The Right Data for the Right Job
The Right Data for the Right JobThe Right Data for the Right Job
The Right Data for the Right Job
 

Kürzlich hochgeladen

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 

Kürzlich hochgeladen (20)

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 

Fast Online Machine Learning with Vowpal Wabbit and Python

  • 1. Beware… For It’S THE... Vowpal platypus Peter HurforD (With a little help from some friends)
  • 2. WE OFTEN WANT TO PREDICT STUFF...
  • 3. WE OFTEN WANT TO PREDICT STUFF… ...BUT WE RUN INTO LIMITATIONS.
  • 4. WE OFTEN WANT TO PREDICT STUFF… ...BUT WE RUN INTO LIMITATIONS. × ...Data set is too large, it doesn’t fit in RAM.
  • 5. WE OFTEN WANT TO PREDICT STUFF… ...BUT WE RUN INTO LIMITATIONS. × ...Data set is too large, it doesn’t fit in RAM. × ...Data set is so large, it doesn’t fit on disk!
  • 6. WE OFTEN WANT TO PREDICT STUFF… ...BUT WE RUN INTO LIMITATIONS. × ...Data set is too large, it doesn’t fit in RAM. × ...Data set is so large, it doesn’t fit on disk! × ...Model train time is so slow, you can’t iterate and try things.
  • 7. “I want to use parallel learning algorithms to create fantastic learning machines!” - John Langford, 1997
  • 8. YOU FOOL! THE ONLY THING PARALLEL MACHINES ARE USEFUL FOR ARE COMPUTATIONAL WINDTUNNELS!
  • 15. Traditional Approach 1. Load all training data into RAM at once. 2. Fit model to training dataset. 3. Load all predicting data into RAM at once. 4. Use trained model to make predictions. WHAT DOES IT DO?
  • 16. VW “Online” Approach 1. Train model on single datapoints, one at a time. 2. Do it again multiple times. 3. Use trained model to predict on new datapoints, one at a time. Traditional Approach 1. Load all training data into RAM at once. 2. Fit model to training dataset. 3. Load all predicting data into RAM at once. 4. Use trained model to make predictions. WHAT DOES IT DO?
  • 17. × Online approach eventually converges to the same results as a traditional (batch) approach over enough iterations. WHAT DOES IT DO?
  • 18. WHAT DOES IT DO? × Online approach eventually converges to the same results as a traditional (batch) approach over enough iterations. × But you’re no longer dependent on RAM!
  • 19. Kaggle: World Data Science Competitions × 3rd, 14th, and 29th / 718 on $16K Criteo ad click challenge × 3rd / 472 on $2K KDD Cup Challenge × 8th / 128 on $25K Avito.ru illicit content filtering challenge IS IT ANY GOOD?
  • 20. × szilard/benchm-ml: widely cited (1127 star) independent ML speed benchmarks. × Logistic Regression on 10M datapoints on a c3.8xlarge instance (32 cores, 60GB RAM). DID I MENTION IT’S FAST? Engine Speed Python Sklearn Crashed R 90sec Vowpal Wabbit 15sec Spark 35sec
  • 21. × szilard/benchm-ml: widely cited (1127 star) independent ML speed benchmarks. × Logistic Regression on 10M datapoints on a c3.8xlarge instance (32 cores, 60GB RAM). DID I MENTION IT’S FAST? Engine Speed Python Sklearn Crashed R 90sec Vowpal Wabbit 15sec Spark 35sec Yes, this was Spark 2.0, but it was using MLLib. ML performance is under testing now.
  • 22. × szilard/benchm-ml: widely cited (1127 star) independent ML speed benchmarks. × Logistic Regression on 10M datapoints on a c3.8xlarge instance (32 cores, 60GB RAM). DID I MENTION IT’S FAST? Engine Speed Python Sklearn Crashed R 90sec Vowpal Wabbit 15sec Spark 35sec But this benchmark was only single core!
  • 23. × szilard/benchm-ml: widely cited (1127 star) independent ML speed benchmarks. × Logistic Regression on 10M datapoints on a c3.8xlarge instance (32 cores, 60GB RAM). DID I MENTION IT’S FAST? Engine Speed Python Sklearn Crashed R 90sec Vowpal Wabbit 15sec Spark 35sec ...and none of the benchmarks include data load time! (VP has none.)
  • 25. WHAT IS VOWPAL PLATYPUS? × An open source vehicle for productionizing Vowpal Wabbit in Python.
  • 26. WHAT IS VOWPAL PLATYPUS? × An open source vehicle for productionizing Vowpal Wabbit in Python. × Train and predict on Python dictionaries instead of the obscure VW format.
  • 27. WHAT IS VOWPAL PLATYPUS? × An open source vehicle for productionizing Vowpal Wabbit in Python. × Train and predict on Python dictionaries instead of the obscure VW format. × Easily use VW’s parallel features to go multicore and multi-machine.
  • 28. WHAT IS VOWPAL PLATYPUS? × An open source vehicle for productionizing Vowpal Wabbit in Python. × Train and predict on Python dictionaries instead of the obscure VW format. × Easily use VW’s parallel features to go multicore and multi-machine. VW has been used on “terascale datasets, with trillions of features, billions of training examples and millions of parameters in an hour using a cluster of 1000 machines.”
  • 29. WHAT IS VOWPAL PLATYPUS? × An open source vehicle for productionizing Vowpal Wabbit in Python. × Train and predict on Python dictionaries instead of the obscure VW format. × Easily use VW’s parallel features to go multicore and multi-machine. ...so far VP has only been used on a maximum of 3 machines (combined 108 core), but we’re getting there...
  • 31.
  • 32.
  • 33.
  • 34.
  • 37. dEMo #2!27,279 MOVIES & 138,494 users
  • 38. dEMo #2!27,279 MOVIES & 138,494 users 3,757,977,826PReDICTIONS...need to be made.
  • 39. dEMo #2!27,279 MOVIES & 138,494 users 21m47s 3,757,977,826PReDICTIONS...need to be made. Total runtime on 3x c4.8xlarge (108 cores total) 342nanoseconds per prediction (wall clock time)
  • 40. THE END! (...OR IS IT?)