Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Chief Data Scientist @ Beeswax
@sizrailev / bit.ly/MLatScale / sergei@beeswax.com / www.beeswax.com
Sergei Izrailev
Design Patterns
for Machine Learning
in Production
Motivation
• A widespread need to leverage in-house data
• Data science expertise is available
• And yet, it seems to be t...
• Beeswax
• Beeswax is an ad tech startup; 40 employees in NYC and London
• Founded by three ex-Googlers
• Real-time biddi...
Overall process Discovery
Research
Prototype
Production
Value Cost
Problem
Statement
Constraints
Start with defining the problem
Problem statement
• Is this the right problem to solve?
• Suppose, we’ve solved the stated problem - what’s the value?
• I...
Define constraints
• Existing production environment architecture
• Technology stack
• Available people and their skills
•...
Dimensions of ML system scalability
• Volume: how much data do we need to process?
• Velocity: how quickly does the data c...
Technical Design of ML Systems
Machine learning systems
ApplyBuild
Model
Data Sources
Labeled data Unlabeled data
Consumer
ML system design
Build Apply
Model
Data Sources
Labeled data Unlabeled data
Consumer
?
? ??
?
?
?
Model deployment
Data Sources
Labeled data Unlabeled data
? ?
Consumer
? ?
Build
Model
?
?
Apply
?
Model deployment
Build
Trans-
form
Train
Code + Data
Apply
Predict
Trans-
form
• Data transformations must be the same in ...
Interface between building and scoring
• In-memory - model is never persisted, train then score
• single application, also...
Scoring systems
Build
Model
Data Sources
Labeled data Unlabeled data
?
?
?
?
Apply Consumer
?? ?
Batch processing
ConsumerApply
Predictions Predictions
New data Trans-
form
Predict
Batch features; consumer predicts
New data
ConsumerApply
Model
Features
Predict
Features
Trans-
form
Single-row predictions
Consumer
Model
Predict
Trans-
form
New data
A service for a single row consumer
ConsumerApply
New data
Prediction
New data
Trans-
form
Predict
A service for a single row consumer
ConsumerApply
New data
Prediction
New data
Trans-
form
Predict
Cache
Cached predictions
ConsumerApply
Cache
Predictions
PredictionKey
Default
New data
Trans-
form
Predict
New data
Predictions
Near-real-time
ConsumerApply
Cache
Predictions
PredictionKey
Default
New data
Trans-
form
Predict
Add data
Predictions
New...
Cached features
ConsumerApply
Cache
Features
FeaturesKey
Model
Features
Predict
New data Trans-
form
New data
Evolution of ML systems
Research
Build
Sample
Transform
& generate
features
Collect
labels
Train &
Test
Extract &
Clean
Prototype
Build
Sample
Transform
generate
features
Collect
labels
Train &
Test
Extract &
Clean
Apply
Transform
Generate
fe...
Production
Build Apply
Serialized
Model
Visualization CI/CD
Monitoring
& Alerts
Model
Evaluation
Change
Mgmt
Error
Handlin...
Production
Fault-tolerant systems
• We should expect problems
• models don’t converge; input data changes; bugs
• Produce ...
Conway's law:
"Any piece of software reflects the
organizational structure that produced it."
People Questions
• Who is developing training?
• Who is developing scoring?
• Who is responsible for training in productio...
Product management
Data science
Data engineering
Application engineering (RT
server-side applications, client-
side applic...
Data scientists
Data scientists as consumers
Apply ConsumerBuild
Serialized
Model
EngineersData scientists
Over the wall
Build Apply
Serialized
Model
Build Apply
Serialized
Model
Consumer
EngineersData scientists
Deliver predictions only (aka “black box”)
Build Apply
Serialized
Model
Consumer
Predictions
EngineersData scientists
Deliver data (serialized model)
Apply ConsumerBuild
Serialized
Model
Scoring
Code
EngineersData scientists
Deliver code and data
Apply ConsumerBuild
Serialized
Model
Scoring
Code
Team: Data Scientists + Engineers
Cross-functional team
Apply ConsumerBuild
Serialized
Model
Scoring
Code
EngineersData scientists
ML platform
Build Apply
Serialized
Model
Consumer
Apply
Build
ML platform
Serialized
Model
Visualization CI/CD
Monitoring
& Alerts
Model
Evaluation
Change
Mgmt
Error
Handling
Data
Vali...
Conclusion
• Find the right problem
• Define constraints
• Design components and interfaces
• Take into account organizati...
Questions?
Yes, we are hiring…
sergei@beeswax.com
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief Data Scientist, BeeswaxIO
Nächste SlideShare
Wird geladen in …5
×

Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief Data Scientist, BeeswaxIO

2.032 Aufrufe

Veröffentlicht am

Presented at #H2OWorld 2017 in Mountain View, CA.

Enjoy the video: https://youtu.be/-rGRHrED94Y.

Learn more about H2O.ai: https://www.h2o.ai/.

Follow @h2oai: https://twitter.com/h2oai.

- - -

Abstract:
Most machine learning systems enable two essential processes: creating a model and applying the model in a repeatable and controlled fashion. These two processes are interrelated and pose technological and organizational challenges as they evolve from research to prototype to production. This presentation outlines common design patterns for tackling such challenges while implementing machine learning in a production environment.

Sergei's Bio:
Dr. Sergei Izrailev is Chief Data Scientist at BeeswaxIO, where he is responsible for data strategy and building AI applications powering the next generation of real-time bidding technology. Before Beeswax, Sergei led data science teams at Integral Ad Science and Collective, where he focused on architecture, development and scaling of data science based advertising technology products. Prior to advertising, Sergei was a quant/trader and developed trading strategies and portfolio optimization methodologies. Previously, he worked as a senior scientist at Johnson & Johnson, where he developed intelligent tools for structure-based drug discovery. Sergei holds a Ph.D. in Physics and Master of Computer Science degrees from the University of Illinois at Urbana-Champaign.

Veröffentlicht in: Technologie
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief Data Scientist, BeeswaxIO

  1. 1. Chief Data Scientist @ Beeswax @sizrailev / bit.ly/MLatScale / sergei@beeswax.com / www.beeswax.com Sergei Izrailev
  2. 2. Design Patterns for Machine Learning in Production
  3. 3. Motivation • A widespread need to leverage in-house data • Data science expertise is available • And yet, it seems to be too hard to extract value from ML
  4. 4. • Beeswax • Beeswax is an ad tech startup; 40 employees in NYC and London • Founded by three ex-Googlers • Real-time bidding (RTB) platform for buying online ads (1M+ QPS) • Platform tailored for customers to leverage in-house data science • Myself • Production AI systems in Pharma, Finance and Ad Tech • Interested in both technology and organizations • bit.ly/MLatScale About
  5. 5. Overall process Discovery Research Prototype Production Value Cost Problem Statement Constraints
  6. 6. Start with defining the problem
  7. 7. Problem statement • Is this the right problem to solve? • Suppose, we’ve solved the stated problem - what’s the value? • Is ML the right tool to solve the problem? • What are the constraints?
  8. 8. Define constraints • Existing production environment architecture • Technology stack • Available people and their skills • Requirements for scale
  9. 9. Dimensions of ML system scalability • Volume: how much data do we need to process? • Velocity: how quickly does the data change? • Variety: what are the types of data, models, and applications? • Veracity: how accurate are our models? • Value: how does it matter to the ML consumer? • Viability: do the benefits outweigh the costs?
  10. 10. Technical Design of ML Systems
  11. 11. Machine learning systems ApplyBuild Model Data Sources Labeled data Unlabeled data Consumer
  12. 12. ML system design Build Apply Model Data Sources Labeled data Unlabeled data Consumer ? ? ?? ? ? ?
  13. 13. Model deployment Data Sources Labeled data Unlabeled data ? ? Consumer ? ? Build Model ? ? Apply ?
  14. 14. Model deployment Build Trans- form Train Code + Data Apply Predict Trans- form • Data transformations must be the same in training and scoring • Some transformations are “models” (PCA, top N, TF-IDF) • Hence, most ML pipelines are DAGs • These DAGs must be reproduced in production scoring
  15. 15. Interface between building and scoring • In-memory - model is never persisted, train then score • single application, also streaming • Data only - linear coefficients, PMML, etc • code is independent • Serialized objects - Pickle, R, Spark, custom • reuse code • Code + Data - e.g., H2O’s POJO • code is generated
  16. 16. Scoring systems Build Model Data Sources Labeled data Unlabeled data ? ? ? ? Apply Consumer ?? ?
  17. 17. Batch processing ConsumerApply Predictions Predictions New data Trans- form Predict
  18. 18. Batch features; consumer predicts New data ConsumerApply Model Features Predict Features Trans- form
  19. 19. Single-row predictions Consumer Model Predict Trans- form New data
  20. 20. A service for a single row consumer ConsumerApply New data Prediction New data Trans- form Predict
  21. 21. A service for a single row consumer ConsumerApply New data Prediction New data Trans- form Predict Cache
  22. 22. Cached predictions ConsumerApply Cache Predictions PredictionKey Default New data Trans- form Predict New data Predictions
  23. 23. Near-real-time ConsumerApply Cache Predictions PredictionKey Default New data Trans- form Predict Add data Predictions New data
  24. 24. Cached features ConsumerApply Cache Features FeaturesKey Model Features Predict New data Trans- form New data
  25. 25. Evolution of ML systems
  26. 26. Research Build Sample Transform & generate features Collect labels Train & Test Extract & Clean
  27. 27. Prototype Build Sample Transform generate features Collect labels Train & Test Extract & Clean Apply Transform Generate features Predict Evaluate Extract & Clean Serialized Model
  28. 28. Production Build Apply Serialized Model Visualization CI/CD Monitoring & Alerts Model Evaluation Change Mgmt Error Handling Data Validation Versioning Security & Access Reporting Testing & QA Logging
  29. 29. Production Fault-tolerant systems • We should expect problems • models don’t converge; input data changes; bugs • Produce acceptable results - even when something fails • Handle predictable error conditions automatically • Minimal human intervention • Easy diagnostics and recovery in case of fatal errors
  30. 30. Conway's law: "Any piece of software reflects the organizational structure that produced it."
  31. 31. People Questions • Who is developing training? • Who is developing scoring? • Who is responsible for training in production? • Who is responsible for scoring in production? • Who deploys new models and model updates? • Who is responsible for quality control?
  32. 32. Product management Data science Data engineering Application engineering (RT server-side applications, client- side applications) UX People’s Functions • Front-end development • Data collection (e.g., logging) • Code deployment • Testing and QA • Infrastructure provisioning • Support
  33. 33. Data scientists Data scientists as consumers Apply ConsumerBuild Serialized Model
  34. 34. EngineersData scientists Over the wall Build Apply Serialized Model Build Apply Serialized Model Consumer
  35. 35. EngineersData scientists Deliver predictions only (aka “black box”) Build Apply Serialized Model Consumer Predictions
  36. 36. EngineersData scientists Deliver data (serialized model) Apply ConsumerBuild Serialized Model Scoring Code
  37. 37. EngineersData scientists Deliver code and data Apply ConsumerBuild Serialized Model Scoring Code
  38. 38. Team: Data Scientists + Engineers Cross-functional team Apply ConsumerBuild Serialized Model Scoring Code
  39. 39. EngineersData scientists ML platform Build Apply Serialized Model Consumer Apply Build
  40. 40. ML platform Serialized Model Visualization CI/CD Monitoring & Alerts Model Evaluation Change Mgmt Error Handling Data Validation Versioning Security & Access Reporting Testing & QA Logging ApplyBuild
  41. 41. Conclusion • Find the right problem • Define constraints • Design components and interfaces • Take into account organizational constraints • Production can’t be an afterthought • The process is a lot of work, but it’s not rocket science
  42. 42. Questions? Yes, we are hiring… sergei@beeswax.com

×