Infrastructure Agnostic Machine Learning Workload Deployment

Infrastructure Agnostic
Machine Learning
Workload Deployment
Abi Akogun
Data Science Consultant (MavenCode)
Charles Adetiloye
ML Platforms Engineer (MavenCode)

About MavenCode
MavenCode is an Artificial Intelligence Solutions company located in Dallas, Texas - We do
training, product development, and consulting services in the following areas:
● Provisioning Scalable Data Processing Pipelines on Cloud Infrastructure
● Development & Deployment of Machine Learning and Artificial Intelligence Platforms
● Streaming and Big Data Analytics Edge-IoT and Sensors

About The Presenters
Charles Adetiloye is an ML Platforms Engineer
at MavenCode. He has well over 15 years of
experience building large-scale, distributed
applications. He has extensive experience
working and consulting with several companies
implementing production grade ML and AI
platforms
twitter.com/cadetiloye
Abiodun Akogun is a Machine Learning and Data
Science Consultant at Mavencode. He has extensive
experience building and deploying large-scale Machine
Learning Applications in different industries that
include Healthcare, Finance, Telecommunications, and
Insurance. He has experience solving several business
problems using Data Analytics, Sentiment Analysis,
Topic Modelling, Named Entity Recognition(N.E.R),
Opinion Mining, Data Mining, Time Series, Spatial
Statistics and Marketing Analytics
twitter.com/akogz

Agenda
▪ Overview of Machine Learning Model Deployment
Workflow
▪ Various Approaches to model training,
management, and serving in the Cloud
▪ Deploying Machine Learning Workloads in the
Cloud
▪ Implementing Feature Storage backend for ML
model training
▪ Running Spark Workloads for ML training on
Kubernetes with Kubeflow

Overview of Machine Learning Deployment Workflow
Data
Sourcing
Pre
Processing
Feature
Engineering
Model
Training /
Evaluation
Model Scoring
/Management
Model
Inferencing

Machine Learning Workload Deployment
Data
Sourcing
Pre
Processing
Feature
Engineering
Model
Training /
Evaluation
Model Scoring
/Management
Model
Inferencing
Google Cloud AWS Azure On Prem

Machine Learning Deployment Effort
Data Verification
Configuration
Feature
Extraction
Data Validation
Machine Resource
Management
Serving
Infrastructure
Monitoring
Analysis Tool
Machine Learning Code
Data Preparation +
Storage
Efficient Compute
Resource Management

Overview of Machine Learning Deployment Workflow
Data
Sourcing
Pre
Processing
Feature
Engineering
Model
Training /
Evaluation
Model Scoring
/Management
Model
Inferencing
32%
10%
36%
2% 4%
16%

A Typical Machine Learning Developer Workflow
Data
Sourcing
Pre
Processing
Feature
Engineering
Model
Training /
Evaluation
Model
Scoring
/Management
Model
Inferencing
Azure
Storage
Google Storage
AWS S3
Storage
Raw Data Transformation Processed Data
Storage Compute
1 2
Google Cloud AI AWS Sage Maker Azure ML
Data Scientist / ML Engineers
works on pulling or processing
data first before starting ML
training on a Managed Cloud
Service
Raw Data Processing and
Transformation Pipeline
Cloud Training Platforms

What Enterprise Machine Learning Workflow In the
Cloud Looks Like!
Data
Sourcing
Pre
Processing
Feature
Engineering
Azure
Storage
Google Storage
AWS S3
Storage
Raw Data Transformation Processed Data
Storage Compute
1 2
Team A
Team B
Team C
Team D
Google Cloud AI
AWS SageMaker
AWS SageMaker
Azure ML
Running ML workflow across
the enterprise with multiple
teams using different Cloud
Provider technology stacks

Implementing Machine
Learning solutions in the
cloud comes at a cost, with
cost of Compute and
Storage on top of the list.

If we plan to be Cloud Neutral, can we abstract our
● Machine Learning Compute Workload→Kubernetes?
● Machine Storage → Feature Store?

Google Cloud AI AWS Sage Maker Azure ML
A Typical Machine Learning Developer Workflow
Data
Sourcing
Pre
Processing
Feature
Engineering
Model
Training /
Evaluation
Model Scoring
/Management
Model
Inferencing
Azure
Storage
Google Storage
AWS S3
Storage
Data Source Transformation Processed Data
Storage Compute
1 2

Towards A Cloud Neutral ML Deployment Environment
Data Sourcing Pre Processing
Feature
Engineering
Model Training /
Evaluation
Model Scoring
/Management
Model
Inferencing
Storage Compute
1 2
Feature Store
Kubernetes

Why the need for
Cloud Agnostic
Deployment
Infrastructure?

● Makes it easier to migrate workloads in a Hybrid Cloud Environment
● We are not tied to particular Cloud Infrastructure technology stack
● It’s easier to Implement best practice patterns and solutions
● Your team will have a common base denominator for all Enterprise ML workload
● Easy to control cost, manage utilization and forecast demand

Cloud Agnostic Machine Learning Development
Feature
Engineering
Model Training /
Evaluation
Model Scoring
/Management
Model
Inferencing
Storage Compute
1 2
Feature Store
Kubernetes
Azure Storage
Google Storage
AWS S3 Storage

What’s Feature Store All about?
A Feature is a measurable observable attribute that is part of the input to a
Machine Learning Model.
Model Training
X1
X2
X3
Xn
[Feature Vector]
Model

What’s Feature Store All about?
Model Training
X1
X2
X3
Xn
[Feature Vector]
Model
Model 1
Features are derived from
● Raw Datastore
● Streaming Datasource
● Aggregates of Raw Inputs
● Windows (mins, hourly, daily, weekly)

Features Change Over time!
Model Training
X1
X2
X3
Xn
X1
X2
X3
Xn
X1
X2
X3
Xn
Time

Machine Learning Feature Store
● Makes it easy to operationalize our ML workload, most importantly Data
Management and Storage for Model training
● Features can be shared easily amon teams running different Model
training pipelines
● We can get to version of datasets and track changes easily
● Consistency in Feature input attributes between Model Training and
Serving

● Offline Feature Store → Batching Training
● Online Feature Store → Inferencing / Serving
Types Of Feature Store

Implementing Offline Feature Storage with Apache Hudi
Azure
Storage
Google Storage
AWS S3
Storage
Streaming Source
Batch Job Operations
Datasource with
Streaming sources like
MQTT, Kafka, Pubsub
etc
Batch Operations on
Databases, FileStorage,
Distributed Storage etc
Feature Store
Workflow Scheduling
Orchestration with
Kubeflow Pipelines or
Airflow Dags on
Kubernetes
Feature Store
Implementation on any
of the Major Cloud
Storage

● A need for a Unified Platform where new data can be made available in addition to historical
data within minutes.
● The need for a quick computation (or derivation ) of Feature vectors in other to make them
available for our model input.
● Incremental Versioning of our Feature collections so that we can time-travel and use a
particular set of features for Model training.
● Our Hudi dataset can be stored in Azure, Google Cloud, AWS cloud storage layer.
● Easy to implement all our code and everything we need to do with Spark and PySpark
Why did we use Apache Hudi?

Getting Data into Hudi Feature Store with Kubeflow Pipeline
import kfp
from kfp import components
KafkaDatastreamer_op =
kfp.components.create_component_from_func(KafkaDatastreamer,base_image="python:3.7.1”)
ValidatorOnSchema_op =
kfp.components.create_component_from_func(ValidatorOnSchema,base_image="python:3.7.1")
PreProcessor_op =
kfp.components.create_component_from_func(PreProcessor,base_image="python:3.7.1")
HudiTableWriter_op= kfp.components.create_component_from_func(HudiTableWriter,
base_image="mavencode.io/spark:v3.1.1")

The Hudi Data Store writer
Configure the Spark Session
with the packages needed to
run hudi and avro
Hudi configuration Options
Writing the data into our
Hudi data store in the right
format

Feature
Engineering
Model Training /
Evaluation
Model Scoring
/Management
Model
Inferencing
Storage Compute
1 2
Feature Store
Kubernetes
Cloud Native ML Workload
Deployment with Operators
on Kubeflow
Cloud Native ML Training Deployment
● Containerized Workload
● Scalable + Can Run in Distributed Mode
● Efficient Compute Utilization
● Language Agnostic!

Machine Learning Operators with Kubeflow on
Kubernetes
● An Machine Learning Operator helps the deployment
monitoring and management a model training life-
cycle
● Some ML Operators found in Kubeflow are:
○ TF-operator → Tensorflow Job
○ Pytorch-operator → Pytorch Job
○ Xgboost-operator → Xgboost Job
○ Spark-operator → Spark and Spark ML Jobs

MLOps Model Training and Deployment Platform
Kubeflow Jupyter NoteBook Kubeflow Jupyter NoteBook Kubeflow Jupyter NoteBook Kubeflow Jupyter NoteBook
Namespace Namespace Namespace Namespace
Auto-Scalable CPU Node Pool Auto-Scalable GPU Node Pool
Spark Operator Spark Operator
TensorFlow Operator Tensorflow Operator
Cloud Infrastructure Layer
Running
Auto Scaling Node Pools
Running Kubernetes
Machine Learning
Operators running with
Kubeflow
Feature Store

Using Spark Operator for Training ML Steps
PySpark
ML Code
Containerize
the Python
Code
Create
SparkApplication
Kubernetes YAML
Deployment
Apply
Deployment to
Kubernetes

Spark Operator on Kubernetes
API
Scheduler
OR OR OR
Spark Driver
Executors

Elastic Compute Resource ML Jobs
API
Scheduler
OR OR OR
kubectl apply -f ...

Deployment Configuration YAML
Spark Application Config
that describes the job and
the namespace where the
job will run
Container that will run our
Spark ML Code
Spark Drive and Executor
Configuration

Connecting to Feature Store with Kubeflow Pipeline

Cost comparison with Managed Cloud service on AWS
30%
100%
15s
66s
Compute Utilization Cost Compute Startup Uptime Team Agility & Productivity
6x Productivity
Managed Services Running on AWS
Kubeflow + S3 Feast Storage ML workload

Summary
● Implementing a Cloud neutral ML deployment approach
simplifies most of the complexities in a Multi-Cloud
environment
● After the initial hump, learning curve and the overall
team efficiency improves significantly
● Teams is not locked in to a particular Cloud
Infrastructure stack
● Easy to control cost and forecast future capacity
demands

Thank You!
If you are interested in learning more about how to run your
Machine Learning Workloads on any Cloud Infrastructure or
Onprem reach out to us
Drop us a mail hello@mavencode.com
Visit Us Online
https://www.mavencode.com
Follow Us
https://www.twitter.com/mavencode

Infrastructure Agnostic Machine Learning Workload Deployment

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Infrastructure Agnostic Machine Learning Workload Deployment

Ähnlich wie Infrastructure Agnostic Machine Learning Workload Deployment (20)

Mehr von Databricks

Mehr von Databricks (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Infrastructure Agnostic Machine Learning Workload Deployment