Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Hydrosphere.io for ODSC: Webinar on Kubeflow

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige

Hier ansehen

1 von 60 Anzeige

Hydrosphere.io for ODSC: Webinar on Kubeflow

Herunterladen, um offline zu lesen

Webinar video: https://www.youtube.com/watch?v=Y3_fcJBgpMw

Kubeflow and Beyond: Automation of Model Training, Deployment, Testing, Monitoring, and Retraining

Speakers:

Stepan Pushkarev, CTO, Hydrosphere.io and Ilnur Garifullin is an ML Engineer, Hydrosphere.io



Abstract: Very often a workflow of training models and delivering them to the production environment contains loads of manual work. Those could be either building a Docker image and deploying it to the Kubernetes cluster or packing the model to the Python package and installing it to your Python application. Or even changing your Java classes with the defined weights and re-compiling the whole project. Not to mention that all of this should be followed by testing your model's performance. It hardly could be named "continuous delivery" if you do it all manually. Imagine you could run the whole process of assembling/training/deploying/testing/running model via a single command in your terminal. In this webinar, we will present a way to build the whole workflow of data gathering/model training/model deployment/model testing into a single flow and run it with a single command.

Webinar video: https://www.youtube.com/watch?v=Y3_fcJBgpMw

Kubeflow and Beyond: Automation of Model Training, Deployment, Testing, Monitoring, and Retraining

Speakers:

Stepan Pushkarev, CTO, Hydrosphere.io and Ilnur Garifullin is an ML Engineer, Hydrosphere.io



Abstract: Very often a workflow of training models and delivering them to the production environment contains loads of manual work. Those could be either building a Docker image and deploying it to the Kubernetes cluster or packing the model to the Python package and installing it to your Python application. Or even changing your Java classes with the defined weights and re-compiling the whole project. Not to mention that all of this should be followed by testing your model's performance. It hardly could be named "continuous delivery" if you do it all manually. Imagine you could run the whole process of assembling/training/deploying/testing/running model via a single command in your terminal. In this webinar, we will present a way to build the whole workflow of data gathering/model training/model deployment/model testing into a single flow and run it with a single command.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Hydrosphere.io for ODSC: Webinar on Kubeflow (20)

Anzeige

Aktuellste (20)

Hydrosphere.io for ODSC: Webinar on Kubeflow

  1. 1. Train and deliver machine learning models to production with a single command STEPAN PUSHKAREV ILNUR GARIFULLIN
  2. 2. Today’s webinar overview 1. Machine Learning Workflow 2. Tools overview a. Kubeflow b. Hydrosphere.io 3. Deep Dive into Automation a. Steps definition b. Steps automation
  3. 3. Machine Learning Workflow
  4. 4. ML Workflow 1. Research 2. Data Preparation 3. Model Training 4. Model Cataloguing 5. Model Deployment 6. Model Integration Testing 7. Production Inferencing 8. Model Performance Monitoring 9. Model Maintenance
  5. 5. Step 1: Research ● Defining an objective ● Defining requirements ● Defining methods ● Defining data sources 1.
  6. 6. Step 2: Data Preparation ● Collecting data ● Preparing data ○ Cleaning ○ Feature engineering ○ Transformation ● Important! To be reused for Inferencing. 1. 2.
  7. 7. Step 3: Model Training ● Building the model ● Training the model ● Evaluating the model ● Tuning hyper-parameters ● Versioning training data 1. 2. 3.
  8. 8. Step 4: Model Cataloguing ● Metadata extraction ○ Graph definition ○ Weights ○ Training data version / stats ○ Other dependencies (look_up vocabulary, etc) ● Indexing model’s binaries ● Versioning a model artifact ● Storing a model in Repository 1. 2. 3. 4.
  9. 9. Step 5: Model Deployment ● Preparing infrastructure for the model ● Preparing runtime for the model ● Deploying the model server ● Exposing API endpoints to the model ● Model Integration 1. 2. 3. 4. 5.
  10. 10. Step 6: Model Integration Testing ● Performing integration tests ● Replaying a golden data set ● Replaying edge cases ● Replaying recent traffic ● Asserting results 1. 2. 3. 4. 5. 6.
  11. 11. Step 7: Production Inferencing ● A/B & Canary deployment ● Model scaling 1. 2. 3. 4. 5. 6. 7.
  12. 12. Step 8: Model Performance Monitoring ● System metrics monitoring ● Model metrics tracking ● Model comparison ● Concept drift monitoring ● Anomaly detection ● Data profiling 1. 2. 3. 4. 5. 6. 7. 8.
  13. 13. Step 9: Model Maintenance ● Alerts & Troubleshooting ● Root Cause Analysis ● Edge Case Exploration ● Retraining Dataset Subsampling ● Retraining 1. 2. 3. 4. 5. 6. 7. 8. 9.
  14. 14. The Toolset
  15. 15. The Machine Learning Model Management Platform The Machine Learning Toolkit for Kubernetes
  16. 16. What is Kubeflow? ● Began as Kubernetes template / blueprint for running Tensorflow ● Evolved into “Toolkit” - loosely coupled tools and blueprints for ML on Kubernetes
  17. 17. What is Hydrosphere.io? Hydrosphere.io is a platform for ML models Management. - An exact value-add “tool” - a part of the toolkit - Opensource - Augments Cataloguing, Deployment, Inferencing, Monitoring and Maintenance
  18. 18. Research Data Prep Training Cataloguing Deployment Integration Testing Production Inferencing Performance Monitoring Model Maintenance Tools Landscape Orchestrate ModelDB
  19. 19. Deep Dive into Workflow Automation Part 1: Creating executables
  20. 20. Step 1: Research
  21. 21. Step 1: Research MNIST● Objective – given an image of the handwritten digit, predict what digit it is; ● Requirements – model export with an ease; ● Tools and Methods – Tensorflow Estimator API; ● Data – Mnist dataset
  22. 22. Step 2: Data Preparation
  23. 23. Step 2: Data Preparation — Building Container FROM python:3.6-slim RUN pip install numpy==1.14.3 Pillow==5.2.0 ADD ./download.py /src/ WORKDIR /src/ ENTRYPOINT [ "python", "download.py" ] $ docker build -t {username}/mnist-pipeline-download . $ docker push {username}/mnist-pipeline-download Dockerfile
  24. 24. Step 3: Model Training — Building a model
  25. 25. Step 3: Model Training
  26. 26. Step 3.5: Model Training and Saving
  27. 27. Step 4: Model Cataloguing DIY: Instrument training pipeline Store metadata Zip model and metadata Store in S3 Or push to Artifactory Or push to git
  28. 28. Step 4: Model Cataloguing DIY: Instrument training pipeline Store metadata Zip model and metadata Store in S3 Or push to Artifactory Or push to git ModelDB: Python DSL: - Sync Model - Sync Test data - Sync metrics Nice UI
  29. 29. Step 4: Model Cataloguing DIY: Instrument training pipeline Store metadata Zip model and metadata Store in S3 Or push to Artifactory Or push to git ModelDB: Python DSL: - Sync Model - Sync Test data - Sync metrics Nice UI Hydrosphere.io: $ hs upload /models/mnist/ $ hs profile push /data/mnist/
  30. 30. Step 4: Model Cataloguing Version Extract metadata Build model docker Image Store in Docker Registry Hydrosphere.io: $ hs upload /models/mnist/ $ hs profile push /data/mnist/
  31. 31. Step 4: Model Cataloguing
  32. 32. Step 5: Model Deployment DIY: Implement model server (Flask App) Lookup for model Dockerize Add Kube configs, tags Expose API (HTTP, gRPC, batch, Streaming)
  33. 33. Step 5: Model Deployment DIY: Implement model server (Flask App) Lookup for Model Dockerize Add Kube configs, tags Expose API (HTTP, gRPC, batch, Streaming) Niche tools: TensorFlow Serving PyTorch Serving Nvidia TensorRT Serving
  34. 34. Step 5: Model Deployment DIY: Implement model server (Flask App) Lookup for Model Dockerize Add Kube configs, tags Expose API (HTTP, gRPC, batch, Streaming) Niche tools: TensorFlow Serving PyTorch Serving Nvidia TensorRT Serving Hydrosphere.io $ hs apply -f - << EOF kind: Application name : “MyPredictionApp” singular: model: mnist:1 runtime: “serving-runtime-python:1.7.0-latest” EOF
  35. 35. Step 5: Model Deployment Hydrosphere.io $ hs apply -f - << EOF kind: Application name : “MyPredictionApp” singular: model: mnist:1 runtime: “serving-runtime-python:1.7.0-latest” EOF metadata runtime model Model launched on Kube HTTP, gRPC, Kafka API
  36. 36. Step 6: Model Integration Testing DIY: Implement testing script Dockerize, add to Kube Replay a golden data Replay edge cases Replay recent traffic Asserting results
  37. 37. Step 6: Model Integration Testing DIY: Implement testing script Dockerize, add to Kube Replay a golden data Replay edge cases Replay recent traffic Asserting results
  38. 38. Step 6: Model Integration Testing DIY: Implement testing script Dockerize, add to Kube Replay a golden data Replay edge cases Replay recent traffic Asserting results Hydrosphere Serving (Q2 2019) $ hs test -f /test/dataset $ hs test replay anomalies $ hs test replay <from_date>
  39. 39. Step 7: Production Inference
  40. 40. Step 8: Model Performance Monitoring
  41. 41. Step 9: Model Maintenance alert accuracy drops data changed what exactly
  42. 42. Step 9: Model Maintenance - explainability of monitoring alert
  43. 43. Deep Dive into Workflow Automation Part 2: Defining Kubeflow Pipeline
  44. 44. Defining Kubeflow Pipeline
  45. 45. Parametrizing function
  46. 46. Stage 1: Defining Downloading Container
  47. 47. Stage 1: Mounting Volumes
  48. 48. Stage 2: Defining Training Container
  49. 49. Stage 2: Defining Training Container
  50. 50. Stage 3: Defining Uploading Container
  51. 51. Stage 4: Defining Deploying Container
  52. 52. Stage 5: Defining Testing Container
  53. 53. Stage 6: Defining Cleaning Container
  54. 54. Compiling Pipeline $ python pipeline.py pipeline.tar.gz $ tar -xvf pipeline.tar.gz # produces pipeline.yaml
  55. 55. Executing Pipeline with a single command $ argo submit pipeline.yaml --watch
  56. 56. Executing Pipeline with UI
  57. 57. Source code https://github.com/Hydrospheredata/hydro-serving-kubeflow-demo
  58. 58. Contact Us GENERAL INQUIRIES hydrosphere.io info@hydrosphere.io linkedin.com/company/hydrospherebigdata twitter.com/hydrospheredata facebook.com/hydrosphere.io ADDRESS 125 University Avenue, Suite 290 Palo Alto, CA, 94301 tel: 650-521-7875 BUSINESS AND TECHNICAL Stepan Pushkarev spushkarev@hydrosphere.io Ilnur Garifullin igarifullin@provectus.com

×