Charles is a Lead ML platforms engineer at MavenCode. He has well over 15 years of experience building large-scale, distributed applications. Topic: Enterprise MLOps in Practice. How to efficiently get your Machine Learning Models from Notebooks to Production!
2. About MavenCode
MavenCode Confidential and Proprietary
MavenCode is a Artificial Intelligence Solutions company located Southlake, Texas - We do training, product
development and consulting services with specialization in
● Provisioning Scalable AI and ML platforms - OnPrem and in the Cloud
● Deployment & Development of Machine Learning Platforms - OnPrem and in the Cloud
● Enterprise Feature Store Development and Management
● Model Management and Governance
● Streaming Data Analytics and Edge IoT Model Deployments
● Document Understanding and Natural Language Processing with Artificial Intelligence
3. Industry Verticals We Serve
Retail Industry
● Recommendation Engines
● Customer Management
● Demand Analysis and Planning
● Logistics and Supply Management
Insurance Industry
● AI Infrastructure Tooling
● Claims Analysis and Processing
● Document Processing
● Damage Detection and Identification
Automotive Industry
● AI infrastructure Tooling
● Near Real Time Car Telemetry Analysis
● Preemptive maintenance
recommendation
MavenCode Confidential and Proprietary
Healthcare Industry
● Medical insurance claim analysis
● X-ray image analysis and diagnostics
● Data Driven decision making enablement
Energy Industry
● Capacity Planning and Demand
Forecasting
● Preemptive Equipment Maintenance
Travel & Hospitality Industry
● Planning and Logistics
● Customer Recommendations
● Logistics, Planning and Forecasting
Telecom Industry
● Utilization Forecasting
● Churn Rate Analysis
● Preemptive Maintenance of Equipments
Agriculture Industry
● Precision Farming
● Mechanical Utilization Rate and Planning
● Capacity Planning
6. Agenda
MavenCode Confidential and Proprietary
1 Overview of Machine Learning Ops
2 MLOps Roles
3 MLOps Landscape
4 Discuss a Use Case
5 Questions and Answers
10. MLOps is not easy!
MavenCode Confidential and Proprietary
Launching a rocket is easy, but the ongoing
operations of guiding it successfully into Space
afterward is hard
11. MavenCode Confidential and Proprietary
“It took me 3 weeks to develop the model. It’s been > 11 months, and it’s still
not deployed”
@ginablaber
“On average, 40% of companies said it takes more than a month to deploy
ML models into production”
thenewstack.io
12. MavenCode Confidential and Proprietary
Machine Learning Operations, or MLOps, helps simplify the processes involved in the deployment
of machine learning models between operations team and machine learning researchers or data
scientists in the organization
What is Machine Learning Operations?
13. MavenCode Confidential and Proprietary
● The goal is to standardize and streamline the Machine Learning Life Cycle management
● Is a critical component of any successful Machine Learning project in the Enterprise
● Organizations generate long term value and mitigate risk associated with Machine Learning
projects
So we can say with MLOps ...
14. MavenCode Confidential and Proprietary
Challenges In Enterprise ML
Reproducibility
● Not Easy to Reproduce ML Model Output
on each iterative runs
● Constantly Changing Training Data
● Consistent Environment Configuration
Issues
Reusability
● Training Pipelines are not
Componentized for Reusability
● No well defined way of doing Model
versioning and tagging
● Collaboration and sharing of source
code is not well defined
Manageability
● Managing model deployment and serving
between environments is difficult
● Versioning and Tracking model artifacts is
very difficult and complex
● No defined way to visually track updates
and changes
Automation
● A lot of deployment process is still
manual
● Steps needed to update model
parameters are not not automated
● Most data science teams are not
equipped with the right knowledge to
take models to production
16. What People Think about Machine Learning
Machine Learning Code
MavenCode Confidential and Proprietary
17. Hidden Technical Debt of ML Deployment
Data Verification
Configuration
Feature
Extraction
Data Validation
Machine Resource
Management
Serving
Infrastructure
Monitoring
Analysis Tool
Machine Learning Code
MavenCode Confidential and Proprietary
18. ● Ensure a scalable and
flexible environment for ML
model pipelines
● Introduce new technologies
that improve ML model
performance in production
● Identify bottlenecks in the
production system and
pinpoint solutions for long
term improvements
ML Architects
● Analyze initial business
goals and model
outcomes
● Minimize overall risk as a
result of ML models in
production
● Ensure compliance with
internal and external
requirements before
pushing ML models to
production
Model Risk
Managers/Auditors
● Conduct and build
operational systems
● Test systems for security,
performance and
availability
● CI/CD pipeline
management
DevOps
● Integrate ML models in
company’s applications
● Ensure seamless working of
ML models with non-ML
based applications
● Maintain functional ML
models in production
ML Engineers
● Identify the right data for a
project
● Optimize the retrieval and
use of data to power ML
models
● Resolve underlying issues in
data pipelines
Data Engineers
● Build models that address
business needs
● Deliver operationalizable
models for production
environment
● Access model quality
Data Scientists
MavenCode Confidential and Proprietary
● Provide business
questions for framing ML
models
● Define business KPIs to
be achieved
● Evaluate Model
performance
Subject Matter Experts
MLOps Roles and Responsibilities
19. Data scientists
Model risk
managers/auditors
Subject Matter
Experts
Business Questions
Data Acquisition Feature Engineering
Data Preparation
Model
Training/Experimentation
Model Evaluation and
Comparison
Develop Models
Runtime
Environment
Risk Evaluation
QA
Scabilibility
Containerization
Continuous
Integration
Prepare for
Production
Subject Matter
Experts
Development
to Production
Logging/Alerting
Input drift tracking
Online Evaluation
Monitoring &
Feedback
Performance Drift
MavenCode Confidential and Proprietary
DevOps Data Engineers
Data Engineers
Data scientists
Software Engineers
ML Architects
Data Engineers
DevOps
1
2
3
4
ML Team Workflow
Model risk
managers/auditors
21. Machine Learning Pipeline
MavenCode Confidential and Proprietary
Data Extraction
Data Preparation &
Analysis
Data QA and Validation
Feature Engineering
Streaming Source
Batch Job Operations
Datasource with
Streaming sources like
MQTT, Kafka, Pubsub etc
Batch Operations on
Databases, FileStorage,
Distributed Storage etc
Model
Training/Validation
Model Training
Model Serving
Model Versioning
Prediction Service
Monitoring
Logging
App
Integration
Deployment / Inferencing
22. Typical ML Engineer or Data Scientist Workflow
Data
Sourcing
Pre
Processing
Feature
Engineering
Model
Training /
Evaluation
Model Scoring
/Management
Model
Inferencing
Azure Storage
Google Storage
AWS S3 Storage
Raw Data Transformation Processed Data
Storage Compute
GCP Vertex AWS SageMaker Azure ML
Data Scientist / ML Engineers works
on pulling or processing data first
before starting ML training on a
Managed Cloud Service
Raw Data Processing and
Transformation Pipeline
Cloud Training Platforms
on-prem KF
23. Team A
Team B
Team C
Team D
Google Cloud AI
AWS SageMaker
KF on prem
Azure ML
Running ML workflow across
the enterprise with multiple
teams using different Cloud
Provider technology stacks
Data
Sourcing
Pre
Processing
Feature
Engineering
Azure Storage
Google Storage
AWS S3 Storage
Raw Data Transformation Processed Data
Storage Compute
At scale, it gets complex ...
MavenCode Confidential and Proprietary
24. To simplify the Complexities can we abstract our ML Pipeline...
Data
Sourcing
Pre
Processing
Feature
Engineering
Model Training
/ Evaluation
Model Scoring
/Management
Model
Inferencing
Storage Compute
1 2
Feature Store
Kubernetes
MavenCode Confidential and Proprietary
25. To simplify the Complexities can we abstract our ML Pipeline...
Data Sourcing Pre
Processing
Feature
Engineering
Model Training /
Evaluation
ModelScoring
/Management
Model
Inferencing
Storage Compute
1 2
Feature Store
Kubeflow on Kubernetes Vertex AI
- Vertex AI Feature Store (Managed Service )
- Feast
- Databricks Feature Store
MavenCode Confidential and Proprietary
27. What’s Feature Store All About
A Feature is a measurable observable attribute that is part of the input to a Machine Learning Model.
X1
X2
X3
Xn
Model Training
[Feature Vector]
Model
MavenCode Confidential and Proprietary
28. What’s Feature Store All About
X1
X2
X3
Xn
Model Training
[Feature Vector]
Model
Features are derived from
● Raw Datastore
● Streaming Datasource
● Aggregates of Raw Inputs
● Windows (mins, hourly, daily, weekly)
MavenCode Confidential and Proprietary
29. Features Change Over time!
X1
X2
X3
Xn
Model Training
X1
X2
X3
Xn
X1
X2
X3
Xn
Time
MavenCode Confidential and Proprietary
30. Feature Stores In MLOps
● Makes it easy to operationalize our ML workload, most importantly Data Management and Storage for
Model training
● Features can be shared easily among teams running different Model training pipelines
● We can get to version of datasets and track changes easily
● Consistency in Feature input attributes between Model Training and Serving
MavenCode Confidential and Proprietary
31. Getting Data into a Feature Store
import kfp
from kfp import components
KafkaDatastreamer_op =
kfp.components.create_component_from_func(KafkaDatastreamer,base_image="python:3.7.1”)
ValidatorOnSchema_op =
kfp.components.create_component_from_func(ValidatorOnSchema,base_image="python:3.7.1")
PreProcessor_op =
kfp.components.create_component_from_func(PreProcessor,base_image="python:3.7.1")
FeatureStoreWriter_op= kfp.components.create_component_from_func(FeatureStoreWriter,
base_image="mavencode.io/spark:v3.1.1")
MavenCode Confidential and Proprietary
33. MavenCode Confidential and Proprietary
Challenges In Enterprise ML
Reproducibility
● Not Easy to Reproduce ML Model Output
on each iterative runs
● Constantly Changing Training Data
● Consistent Environment Configuration
Issues
Reusability
● Training Pipelines are not
Componentized for Reusability
● No well defined way of doing Model
versioning and tagging
● Collaboration and sharing of source
code is not well defined
Manageability
● Managing model deployment and serving
between environments is difficult
● Versioning and Tracking model artifacts is
very difficult and complex
● No defined way to visually track updates
and changes
Automation
● A lot of deployment process is still
manual
● Steps needed to update model
parameters are not not automated
● Most data science teams are not
equipped with the right knowledge to
take models to production
34. Why Machine Learning with Kubeflow?
With Kubeflow out of the box on Kubernetes, we can easily have
Composability Portability
MavenCode Confidential and Proprietary
Scalability
35. What is Kubeflow
● Machine learning toolkit for Kubernetes.
● Platform to productionize ML models, making them simple, scalable and
reliable.
● Collection of Cloud native tools for all the stages of a model development
life cycle.
● Build integrated end-to-end pipelines which connect all the stages of a
model development life cycle.
MavenCode Confidential and Proprietary
36. Simply Put ...
Kubeflow Simplifies your Model Development Life Cycle (MDLC)
MavenCode Confidential and Proprietary
39. 3
1
2
Enterprise Machine Learning with Kubeflow
MLOps Training and Deployment Platform
In-Cluster Traffic Control By ISTIO -
RBAC, Access UI With SSO Identity
Compatible Proxy
Kubeflow Jupyter NoteBook Kubeflow Jupyter NoteBook Kubeflow Jupyter NoteBook Kubeflow Jupyter NoteBook
Kubeflow Managed Model
Infrastructure
Namespace - Bob Namespace - Dav Namespace - Chuck Namespace - Team
Data Scientist 1 Data Scientist 2 Data Scientist 3
Data Science Team
Authentication and
Authorization
Auto-Scalable CPU Node Pool Auto-Scalable GPU Node Pool
MavenCode Confidential and Proprietary
42. Airline Customer Prediction
● The Dataset is from Kaggle.
● The data is from an airline organization whose actual name is not given for
various reasons, therefore, the airline is given the pseudonym Invistico airlines.
● The dataset consists of (23 columns and 129880 entries) details of customers
who have already flown with them.
MavenCode Confidential and Proprietary
Data Scientists
Subject Matter
Experts
43. Problem Statement
Customer satisfaction is priority in the airline industry.
Unhappy or disengaged customers naturally mean fewer passengers and less revenue.
As satisfaction is rarely solely about the flight itself but also the experience from booking to landing, this scenario is aimed
at building a machine learning model using all salient features in the data to predict customer satisfaction.
45. Customers on business class seats were the most satisfied.
The dataset showed more satisfied customers than otherwise, with 54.7% of
the surveyed customers reporting satisfaction with their experiences
Exploratory Data Analysis
MavenCode Confidential and Proprietary
There were more female travelers than males and more females
reported satisfaction with their experiences.
Most customers travelled for business purposes and satisfaction was
higher in business travelers.
48. Feature Engineering
To make the data fit four our machine learning model, we performed the
following feature engineering steps:
1. Removing outliers
2. Dropping rows with null values
3. Dropping and combining columns with little or no correlation with our
variable
4. Converting Categorical features to numbers
MavenCode Confidential and Proprietary
Data Scientists
Data Engineers
49. Before Outlier Removal After Outlier Removal
MavenCode Confidential and Proprietary
Feature Engineering: Outlier Removal
50. Feature Engineering Data Pipeline
● Load data: reads data from source.
● Dataset Statistics: displays summary statistics of the data.
● Dataset Schema: automatically generates a schema by
inferring types, categories, and ranges from the data.
● Dataset Validation: uses the inferred schema to detect
anomalies in the data.
● Feature Engineering: performs necessary preprocessing
and feature engineering steps on the dataset.
MavenCode Confidential and Proprietary
52. ● An ML operator helps to deploy, monitor and manage the
lifecycle of a training job.
● Kubeflow Operators Include
○ Tf-operator
○ Pytorch-operator,
○ Xgboost-operator
○ MPI-operator and many more which can be found on
the official kubeflow account.
ML Operators - Overview
MavenCode Confidential and Proprietary
53. Model Training with Tensorflow Operator
● Tensorflow Operator is one of the operators offered by Kubeflow to make it easy to run and
monitor both distributed and non-distributed tensorflow jobs on Kubernetes.
● Training tensorflow models using tf-operator relies on centralized parameter servers for
coordination between workers. It supports the tensorflow framework only.
● After preprocessing our data, we built a tensorflow neural network model.
● Our tensorflow model had an accuracy of approximately 88%.
MavenCode Confidential and Proprietary
54. MavenCode Confidential and Proprietary
Hyperparameter Tuning
Model Risk
Managers/Auditors
ML Engineers
Data Scientists
55. Hyperparameters: Configuration and variable values that are external to the model, the values are always
set before model training process begin
Selecting the right Hyperparameters can significantly improve model performance in production
Hyperparameter Tuning: Is all about finding hyperparameter input values that optimizes the objective
function of the model training
What is Hyperparameter Tuning?
(a1, b1, c1,.....zN)
(a2, b2, c2,.....zN)
(a3, b3, c3,.....zN)
MavenCode Confidential and Proprietary
57. Manually tuning by Hand is very inefficient, error-prone and difficult to track
Capturing metrics across multiple jobs and comparing them is difficult!
Efficiently allocating resources and infrastructure on the Cluster to handle all the job runs is not an easy
task
As more Hyperparameters are added, the combinatorial search space of possible inputs to maximize the
training objective function grows exponentially!
Hyperparameter Tuning is Hard!
MavenCode Confidential and Proprietary
58. Hyperparameter Tuning with Katib on Kubeflow
Katib is the Hyperparameter tuning component of Kubeflow
It is Language and Framework Agnostic
- Tensorflow
- Pytorch
- MxNet
- XGBoost
Customizable Hyperparameter Search space Algorithm
- Random Search
- Grid search
- Bayesian Optimization
- Hyperband
MavenCode Confidential and Proprietary
59. 1. Experiment: An experiment is a single tuning run, also called an optimization run. You specify configuration
settings to define the experiment. The following are the main configurations:
● Objective: What you intend to optimize. This is the objective metric, also called the target variable.
● Search Space: The set of all possible hyperparameter values that the hyperparameter tuning job
should consider for optimization, and the constraints for each hyperparameter.
● Search Algorithm: The algorithm to use when searching for the optimal hyperparameter values.
Katib Concepts
MavenCode Confidential and Proprietary
60. Hyperparameter Tuning with Katib
Katib automates the Hyperparameter Tuning
process by running a pre-configured number of
training jobs (known as trials) in parallel.
MavenCode Confidential and Proprietary
61. Result of Katib Experiment
With katib hyperparameter tuning, accuracy increased from 88% to 92.1%
MavenCode Confidential and Proprietary
62. Model Serving with KFServing
● KFServing is Kubeflow’s model deployment
and serving toolkit
● To efficiently serve our model using
KfServing, we built a Kubeflow pipeline to
load data, preprocess, train the model, make
predictions, export and serve the model.
MavenCode Confidential and Proprietary
68. Model Development Life Cycle (Data Scientist View)
Data Information Knowledge Insight
Data Scientist workflow essentially follows this path ...
MavenCode Confidential and Proprietary
69. Machine Learning Development Life Cycle (Production Deployment)
Model Training
T
r
a
i
n
i
n
g
D
a
t
a
E
T
L
Tuning
Inferencing
S
e
r
v
i
n
g
M
o
n
i
t
o
r
i
n
g
Update
MavenCode Confidential and Proprietary