Fast Online Machine Learning with Vowpal Wabbit and Python

•

1 gefällt mir•572 views

Vowpal Platypus is a general use, lightweight Python wrapper built on Vowpal Wabbit, that uses online learning to achieve great results. https://github.com/peterhurford/vowpal_platypus

Daten & Analysen

Beware… For It’S THE...
Vowpal platypus
Peter HurforD
(With a little help from some friends)

WE OFTEN WANT TO PREDICT STUFF…
...BUT WE RUN INTO LIMITATIONS.

WE OFTEN WANT TO PREDICT STUFF…
...BUT WE RUN INTO LIMITATIONS.
× ...Data set is too large, it doesn’t fit in RAM.

WE OFTEN WANT TO PREDICT STUFF…
...BUT WE RUN INTO LIMITATIONS.
× ...Data set is too large, it doesn’t fit in RAM.
× ...Data set is so large, it doesn’t fit on disk!

WE OFTEN WANT TO PREDICT STUFF…
...BUT WE RUN INTO LIMITATIONS.
× ...Data set is too large, it doesn’t fit in RAM.
× ...Data set is so large, it doesn’t fit on disk!
× ...Model train time is so slow, you can’t iterate
and try things.

“I want to use parallel
learning algorithms to
create fantastic learning
machines!”
- John Langford, 1997

YOU FOOL! THE ONLY
THING PARALLEL
MACHINES ARE USEFUL
FOR ARE COMPUTATIONAL
WINDTUNNELS!

VOWPAL
...Fast Online Learning
TEN YEARS LATER...

Traditional Approach
1. Load all training data
into RAM at once.
2. Fit model to training
dataset.
3. Load all predicting data
into RAM at once.
4. Use trained model to
make predictions.
WHAT DOES IT DO?

VW “Online” Approach
1. Train model on single
datapoints, one at a
time.
2. Do it again multiple
times.
3. Use trained model to
predict on new
datapoints, one at a
time.
Traditional Approach
1. Load all training data
into RAM at once.
2. Fit model to training
dataset.
3. Load all predicting data
into RAM at once.
4. Use trained model to
make predictions.
WHAT DOES IT DO?

× Online approach
eventually converges to
the same results as a
traditional (batch)
approach over enough
iterations.
WHAT DOES IT DO?

WHAT DOES IT DO?
× Online approach
eventually converges to
the same results as a
traditional (batch)
approach over enough
iterations.
× But you’re no longer
dependent on RAM!

Kaggle: World Data Science Competitions
× 3rd, 14th, and 29th / 718 on $16K Criteo ad click challenge
× 3rd / 472 on $2K KDD Cup Challenge
× 8th / 128 on $25K Avito.ru illicit content filtering challenge
IS IT ANY GOOD?

× szilard/benchm-ml: widely cited (1127 star) independent ML
speed benchmarks.
× Logistic Regression on 10M datapoints on a c3.8xlarge instance
(32 cores, 60GB RAM).
DID I MENTION IT’S FAST?
Engine Speed
Python Sklearn Crashed
R 90sec
Vowpal Wabbit 15sec
Spark 35sec

WHAT IS VOWPAL PLATYPUS?
× An open source vehicle for productionizing
Vowpal Wabbit in Python.

WHAT IS VOWPAL PLATYPUS?
× An open source vehicle for productionizing
Vowpal Wabbit in Python.
× Train and predict on Python dictionaries
instead of the obscure VW format.

dEMo #2!27,279 MOVIES & 138,494 users
3,757,977,826PReDICTIONS...need to be made.

dEMo #2!27,279 MOVIES & 138,494 users
21m47s
3,757,977,826PReDICTIONS...need to be made.
Total runtime on
3x c4.8xlarge
(108 cores total)
342nanoseconds per prediction
(wall clock time)

Empfohlen

Voxxed Days Thesaloniki 2016 - Machine Learning for DevelopersVoxxed Days Thessaloniki

Tensorflow London 12: Marcel Horstmann and Laurent Decamp 'Using TensorFlow t...Seldon

Pydata2017 11-29Yuta Kashino

A TurtleBot Configurations Measurement Harness to Build a Sensitivity ModelMiguel Velez

Deep learningPratap Dangeti

Machine learning with scikitlearnPratap Dangeti

Lectura 3. Almacenamiento y gestión de la informaciónCarolina Lizbeth Pineda Hernandez

Chicago Spark Meetup 03 01 2016 - Spark and RecommendationsChris Fregly

Empfohlen

Voxxed Days Thesaloniki 2016 - Machine Learning for DevelopersVoxxed Days Thessaloniki

Tensorflow London 12: Marcel Horstmann and Laurent Decamp 'Using TensorFlow t...Seldon

Pydata2017 11-29Yuta Kashino

A TurtleBot Configurations Measurement Harness to Build a Sensitivity ModelMiguel Velez

Deep learningPratap Dangeti

Machine learning with scikitlearnPratap Dangeti

Lectura 3. Almacenamiento y gestión de la informaciónCarolina Lizbeth Pineda Hernandez

Chicago Spark Meetup 03 01 2016 - Spark and RecommendationsChris Fregly

Advanced Python : Static and Class Methods Bhanwar Singh Meena

Object Oriented Programming in PythonSujith Kumar

Advance OOP concepts in PythonSujith Kumar

Basics of Object Oriented Programming in PythonSujith Kumar

Python Tricks That You Can't Live WithoutAudrey Roy

Prepping the Analytics organization for Artificial Intelligence evolutionRamkumar Ravichandran

Python 101: Python for Absolute Beginners (PyTexas 2014)Paige Bailey

Python for Image Understanding: Deep Learning with Convolutional Neural NetsRoelof Pieters

Python Worst PracticesDaniel Greenfeld

Deep Learning - The Past, Present and Future of Artificial IntelligenceLukas Masuch

Learn 90% of Python in 90 MinutesMatt Harrison

Introduction to PythonNowell Strite

Deep Learning through ExamplesSri Ambati

Spark Gotchas and Lessons Learned (2/20/20)Jen Waller

MongoDB & Machine LearningTom Maiaroto

The computer science behind a modern disributed data storeJ On The Beach

OSDC 2018 | The Computer science behind a modern distributed data store by Ma...NETWAYS

Aug 2012 HUG: Hug BigTopYahoo Developer Network

The Computer Science Behind a modern Distributed DatabaseArangoDB Database

Introduction to pythonRajesh Rajamani

Dear compiler please don't be my nanny v2Dino Dini

Distributed machine learning 101 using apache spark from a browser devoxx.b...Andy Petrella

Weitere ähnliche Inhalte

Andere mochten auch

Advanced Python : Static and Class Methods Bhanwar Singh Meena

Object Oriented Programming in PythonSujith Kumar

Advance OOP concepts in PythonSujith Kumar

Basics of Object Oriented Programming in PythonSujith Kumar

Python Tricks That You Can't Live WithoutAudrey Roy

Prepping the Analytics organization for Artificial Intelligence evolutionRamkumar Ravichandran

Python 101: Python for Absolute Beginners (PyTexas 2014)Paige Bailey

Python for Image Understanding: Deep Learning with Convolutional Neural NetsRoelof Pieters

Python Worst PracticesDaniel Greenfeld

Deep Learning - The Past, Present and Future of Artificial IntelligenceLukas Masuch

Learn 90% of Python in 90 MinutesMatt Harrison

Introduction to PythonNowell Strite

Deep Learning through ExamplesSri Ambati

Andere mochten auch (13)

Advanced Python : Static and Class Methods

Object Oriented Programming in Python

Advance OOP concepts in Python

Basics of Object Oriented Programming in Python

Python Tricks That You Can't Live Without

Prepping the Analytics organization for Artificial Intelligence evolution

Python 101: Python for Absolute Beginners (PyTexas 2014)

Python for Image Understanding: Deep Learning with Convolutional Neural Nets

Python Worst Practices

Deep Learning - The Past, Present and Future of Artificial Intelligence

Learn 90% of Python in 90 Minutes

Introduction to Python

Deep Learning through Examples

Ähnlich wie Fast Online Machine Learning with Vowpal Wabbit and Python

Spark Gotchas and Lessons Learned (2/20/20)Jen Waller

MongoDB & Machine LearningTom Maiaroto

The computer science behind a modern disributed data storeJ On The Beach

OSDC 2018 | The Computer science behind a modern distributed data store by Ma...NETWAYS

Aug 2012 HUG: Hug BigTopYahoo Developer Network

The Computer Science Behind a modern Distributed DatabaseArangoDB Database

Introduction to pythonRajesh Rajamani

Dear compiler please don't be my nanny v2Dino Dini

Distributed machine learning 101 using apache spark from a browser devoxx.b...Andy Petrella

Leveraging Open Source Automated Data Science ToolsDomino Data Lab

Data oriented design and c++Mike Acton

Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Databricks

Metasepi team meeting #16: Safety on ATS language + MCUKiwamu Okabe

Sparklife - Life In The Trenches With SparkIan Pointer

Lessons I Learned While Scaling to 5000 Puppet AgentsPuppet

Data science tutorialKarumanchi Sujatha

Scaling PyData Up and OutTravis Oliphant

Wapid and wobust active online machine leawning with Vowpal Wabbit Antti Haapala

Beat the devil: towards a Drupal performance benchmarkPedro González Serrano

The Right Data for the Right JobEmily Curtin

Ähnlich wie Fast Online Machine Learning with Vowpal Wabbit and Python (20)

Spark Gotchas and Lessons Learned (2/20/20)

MongoDB & Machine Learning

The computer science behind a modern disributed data store

OSDC 2018 | The Computer science behind a modern distributed data store by Ma...

Aug 2012 HUG: Hug BigTop

The Computer Science Behind a modern Distributed Database

Introduction to python

Dear compiler please don't be my nanny v2

Distributed machine learning 101 using apache spark from a browser devoxx.b...

Leveraging Open Source Automated Data Science Tools

Data oriented design and c++

Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...

Metasepi team meeting #16: Safety on ATS language + MCU

Sparklife - Life In The Trenches With Spark

Lessons I Learned While Scaling to 5000 Puppet Agents

Data science tutorial

Scaling PyData Up and Out

Wapid and wobust active online machine leawning with Vowpal Wabbit

Beat the devil: towards a Drupal performance benchmark

The Right Data for the Right Job

Kürzlich hochgeladen

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

Invezz.com - Grow your wealth with trading signalsInvezz1

Mature dropshipping via API with DroFx.pptxolyaivanovalion

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh9953056974 Low Rate Call Girls In Saket, Delhi NCR

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71

Brighton SEO | April 2024 | Data StorytellingNeil Barnes

CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

Edukaciniai dropshipping via API with DroFxolyaivanovalion

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

04242024_CCC TUG_Joins and Relationshipsccctableauusergroup

VidaXL dropshipping via API with DroFx.pptxolyaivanovalion

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor

Kürzlich hochgeladen (20)

100-Concepts-of-AI by Anupama Kate .pptx

Invezz.com - Grow your wealth with trading signals

Mature dropshipping via API with DroFx.pptx

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure

FESE Capital Markets Fact Sheet 2024 Q1.pdf

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

Brighton SEO | April 2024 | Data Storytelling

CebaBaby dropshipping via API with DroFX.pptx

Log Analysis using OSSEC sasoasasasas.pptx

Edukaciniai dropshipping via API with DroFx

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...

04242024_CCC TUG_Joins and Relationships

VidaXL dropshipping via API with DroFx.pptx

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati

Fast Online Machine Learning with Vowpal Wabbit and Python

1. Beware… For It’S THE... Vowpal platypus Peter HurforD (With a little help from some friends)

2. WE OFTEN WANT TO PREDICT STUFF...

3. WE OFTEN WANT TO PREDICT STUFF… ...BUT WE RUN INTO LIMITATIONS.

4. WE OFTEN WANT TO PREDICT STUFF… ...BUT WE RUN INTO LIMITATIONS. × ...Data set is too large, it doesn’t fit in RAM.

5. WE OFTEN WANT TO PREDICT STUFF… ...BUT WE RUN INTO LIMITATIONS. × ...Data set is too large, it doesn’t fit in RAM. × ...Data set is so large, it doesn’t fit on disk!

6. WE OFTEN WANT TO PREDICT STUFF… ...BUT WE RUN INTO LIMITATIONS. × ...Data set is too large, it doesn’t fit in RAM. × ...Data set is so large, it doesn’t fit on disk! × ...Model train time is so slow, you can’t iterate and try things.

7. “I want to use parallel learning algorithms to create fantastic learning machines!” - John Langford, 1997

8. YOU FOOL! THE ONLY THING PARALLEL MACHINES ARE USEFUL FOR ARE COMPUTATIONAL WINDTUNNELS!

9. TEN YEARS LATER...

10. VOWPAL ...Fast Online Learning TEN YEARS LATER...

11. ...WHAT’s WITH THE NAME?

12. ...WHAT’s WITH THE NAME?

13. ...WHAT’s WITH THE NAME? +

14. ...WHAT’s WITH THE NAME? +

15. Traditional Approach 1. Load all training data into RAM at once. 2. Fit model to training dataset. 3. Load all predicting data into RAM at once. 4. Use trained model to make predictions. WHAT DOES IT DO?

16. VW “Online” Approach 1. Train model on single datapoints, one at a time. 2. Do it again multiple times. 3. Use trained model to predict on new datapoints, one at a time. Traditional Approach 1. Load all training data into RAM at once. 2. Fit model to training dataset. 3. Load all predicting data into RAM at once. 4. Use trained model to make predictions. WHAT DOES IT DO?

17. × Online approach eventually converges to the same results as a traditional (batch) approach over enough iterations. WHAT DOES IT DO?

18. WHAT DOES IT DO? × Online approach eventually converges to the same results as a traditional (batch) approach over enough iterations. × But you’re no longer dependent on RAM!

19. Kaggle: World Data Science Competitions × 3rd, 14th, and 29th / 718 on $16K Criteo ad click challenge × 3rd / 472 on $2K KDD Cup Challenge × 8th / 128 on $25K Avito.ru illicit content filtering challenge IS IT ANY GOOD?

20. × szilard/benchm-ml: widely cited (1127 star) independent ML speed benchmarks. × Logistic Regression on 10M datapoints on a c3.8xlarge instance (32 cores, 60GB RAM). DID I MENTION IT’S FAST? Engine Speed Python Sklearn Crashed R 90sec Vowpal Wabbit 15sec Spark 35sec

21. × szilard/benchm-ml: widely cited (1127 star) independent ML speed benchmarks. × Logistic Regression on 10M datapoints on a c3.8xlarge instance (32 cores, 60GB RAM). DID I MENTION IT’S FAST? Engine Speed Python Sklearn Crashed R 90sec Vowpal Wabbit 15sec Spark 35sec Yes, this was Spark 2.0, but it was using MLLib. ML performance is under testing now.

22. × szilard/benchm-ml: widely cited (1127 star) independent ML speed benchmarks. × Logistic Regression on 10M datapoints on a c3.8xlarge instance (32 cores, 60GB RAM). DID I MENTION IT’S FAST? Engine Speed Python Sklearn Crashed R 90sec Vowpal Wabbit 15sec Spark 35sec But this benchmark was only single core!

23. × szilard/benchm-ml: widely cited (1127 star) independent ML speed benchmarks. × Logistic Regression on 10M datapoints on a c3.8xlarge instance (32 cores, 60GB RAM). DID I MENTION IT’S FAST? Engine Speed Python Sklearn Crashed R 90sec Vowpal Wabbit 15sec Spark 35sec ...and none of the benchmarks include data load time! (VP has none.)

24. ...But what’s THIS ABOUT A PLATYPUS?

25. WHAT IS VOWPAL PLATYPUS? × An open source vehicle for productionizing Vowpal Wabbit in Python.

26. WHAT IS VOWPAL PLATYPUS? × An open source vehicle for productionizing Vowpal Wabbit in Python. × Train and predict on Python dictionaries instead of the obscure VW format.

27. WHAT IS VOWPAL PLATYPUS? × An open source vehicle for productionizing Vowpal Wabbit in Python. × Train and predict on Python dictionaries instead of the obscure VW format. × Easily use VW’s parallel features to go multicore and multi-machine.

28. WHAT IS VOWPAL PLATYPUS? × An open source vehicle for productionizing Vowpal Wabbit in Python. × Train and predict on Python dictionaries instead of the obscure VW format. × Easily use VW’s parallel features to go multicore and multi-machine. VW has been used on “terascale datasets, with trillions of features, billions of training examples and millions of parameters in an hour using a cluster of 1000 machines.”

29. WHAT IS VOWPAL PLATYPUS? × An open source vehicle for productionizing Vowpal Wabbit in Python. × Train and predict on Python dictionaries instead of the obscure VW format. × Easily use VW’s parallel features to go multicore and multi-machine. ...so far VP has only been used on a maximum of 3 machines (combined 108 core), but we’re getting there...

30. dEMo #1!

31.

32.

33.

34.

35. dEMo #2!

36. dEMo #2!

37. dEMo #2!27,279 MOVIES & 138,494 users

38. dEMo #2!27,279 MOVIES & 138,494 users 3,757,977,826PReDICTIONS...need to be made.

39. dEMo #2!27,279 MOVIES & 138,494 users 21m47s 3,757,977,826PReDICTIONS...need to be made. Total runtime on 3x c4.8xlarge (108 cores total) 342nanoseconds per prediction (wall clock time)

40. THE END! (...OR IS IT?)