In this talk, you’ll learn about techniques used to build a feature drift detection as a service capability for your enterprise and beyond. Feature drift monitoring is a way to check volatility of machine learning model inputs. It can trigger investigations for potential model degradation as well as explain why models have shifted.
Unlocking the Future of AI Agents with Large Language Models
Feature drift monitoring as a service for machine learning models at scale
1. Feature dri monitoring as a service for
machine learning models at scale
PyData Global 2020
Keira Zhou
Noriaki (Nori) Tatsumi
2. A feature dri is a change in the joint distribution of a feature and a
target
Covariate shi
Feature distribution change without label distribution change
Prior probability shi
Label distribution change without feature distribution change
Concept shi
Feature and label distribution stay the same but the relationship between the two change
https://towardsdatascience.com/understanding-dataset-shi-f2a5a262a766
3. Why does an enterprise with business critical ML models need easy
access to a comprehensive feature dri monitoring solution?
• Machine learning is learning from data (i.e. features)
• Many models are very brittle
• Prevent financial loss and harm to the brand of your business
• Not every ML team has the resource to build and maintain a complete monitoring solution
4. Our feature monitoring service provides statistics and model based
metrics and analysis for detecting feature dris
Descriptive statistics
mean, median, min, max, standard deviation, percentiles
Data quality metrics
count, sum, # of NULLs, # of NaNs
Statistics and model based analysis
population stability index (PSI), time-series changepoint and anomaly detection
Interactive User Interfaces
time-series visualization dashboards, API, SQL interface, alerts
5. Our feature monitoring service empowers our users to continuously
make sure that their models are performing well
Reactive re-training trigger
Trigger for model developers to investigate potential model degradation
Proactive feature selection
A way to check the volatility of features - may lead to omission of use or more frequent monitoring
Model degradation analysis
Could explain why a model has shied
6. The 5 key design decisions for our scalable feature monitoring service
Features are
cataloged in a
registry and
persisted in
standardized formats
with timestamps
Client users and
applications must
bring their own
access to features to
avoid the platform
from having the keys
to the kingdom
Users need to be able
to specify their
groupby keys to
produce meaningful
metrics and analysis
The service is a
distributed system
with ephemeral
processes and a
resilient and robust
orchestrator
Provide tools to
visualize, slice and
dice the metrics and
analysis
Minimize the blast
radius from a
potential security
event
Aggregation
attributes are
configurable
Empower users to
derive conclusions
and decisions
The features must
be discoverable
and readable
Isolate failures
across the multiple
tenants
7. Feature Data Pipeline Architecture
Feature Persistence
Channel
Ent. Data Ingestion
Service
Feature Value
(Avro, Parquet, CSV)
Batch
Streaming
API
Feature Compute Feature
Storage
Feature Monitoring
Ent. File Storage
Feature Value
(Parquet)
Ent. Data (Feature)
Registry
Feature
Metadata
Feature
Monitoring as
a Service
HTTP/gRPCHTTP
AWSS3
● An Enterprise Data Registry that catalogs each feature’s
ID, data format, schema, location, partition keys, etc
● A unified Enterprise Data Ingestion Service for all feature
compute outputs in various execution contexts that sinks
all data as Parquet files in AWS S3 storage
9. Trigger and Configuration of Feature Statistic Calculation
• An API as the Entry point of the pipeline
• Uniquely identify a feature by Feature ID
• Receives a Dataset ID and location from the user
• Retrieves Feature IDs from Enterprise Feature Registry
based on Dataset ID
API
Enterprise
Feature Registry
10. Trigger and Configuration of Statistic Calculation (Cont’d)
Triggers the PySpark EMR cluster with
configuration parameters
• Dataset location
• Enterprise Dataset Unique ID
• Enterprise Feature IDs
• Temporary Client Credentials: to access
the dataset
• Partition Timestamp (ETL time): when
the features were calculated
• Field Timestamp (event time): Indicate
which fields is the event timestamp
• Aggregation Fields: the fields to
aggregate and produce stats on
Biking Length (mile) Biking Elevation (ft) Event Time ETL Time
5 243 202005 202009
10 100 202005 202009
8 185 202006 202009
20 320 202007 202010
15 231 202008 202010
Avg Biking
Length (mile)
Avg Biking
Elevation (ft)
Event
Time
7.5 171.5 202005
8 185 202006
20 320 202007
15 231 202008
Agg by
Event Time
Agg by
ETL Time
Avg Biking
Length (mile)
Avg Biking
Elevation (ft)
ETL Time
7.67 176 202009
17.5 275.5 202010
11. Distributed Stats Calculation
• Stats calculated
• min, max, average, standard deviation
• median, 25% & 75% quantiles,
• count, # of null, # of nan
• PSI
• Runs on EMR:
• Ephemeral
• Separate cluster per calculation
• All the results are
• Sent to Enterprise Kaa Cluster
• Saved into Enterprise Managed S3
• Saved in to Postgres Database
• All stats are connected to a job ID
• Easier debugging
12. Postgres Table Design
• Feature Stats table
• Stores all computed stats
• Parent - Child table design based on feature name
• Feature Stats Job Status table
• Tracks the status of a job
• Updated by Trigger API, PySpark job and Ingestion Engine
Parent Table
● feature_1_table_pointer
● feature_2_table_pointer
feature_1_child_Table feature_2_child_Table
13. Managed Kubernetes Cluster
• Most of our components are running on a managed Kubernetes cluster in AWS
• Individual - Personal Namespace; Team - Team namespace
• Helm charts to config different environment: dev, qa, prod
• Skaffold to build, push and deploy the application
Internal
Dockyard
Dockerized
Java Application
Ku
Kubernetes Cluster
14. Monitoring Statistics Serving Interface
• Dashboard
• A clear centralized view of various feature statistics
• Connect to Postgres DB
• GraphQL API
• Retrieve stats of a given feature
• Good for customized plotting
• Integration with Jupyter notebook or other applications
Aggregated based
on event time from
two different
partition
15. The 5 key design decisions for our scalable feature monitoring service
Features are
cataloged in a registry
and persisted in
standardized formats
with timestamps
Client users and
applications must
bring their own access
to features to avoid
the platform from
having the keys to the
kingdom
Users need to be able
to specify their
groupby keys to
produce meaningful
metrics and analysis
The service is a
distributed system
with ephemeral
processes and a
resilient and robust
orchestrator
Provide tools to
visualize, slice and
dice the metrics and
analysis
Minimize the blast
radius from a
potential security
event
Aggregation
attributes are
configurable
Empower users to
derive conclusions
and decisions
The features must
be accessible,
identifiable and
readable
Isolate failures
across the multiple
tenants
i.e. Standard time-series
ingestion pipeline with
Parquet output and
features registered in the
enterprise Feature Registry
i.e. Borrow clients’
temporary AWS STS
tokens and track the
activity in the audit log
i.e. Enable users to
configure the aggregation
key per feature via REST
API
i.e. Usage of ephemeral
EMR instances for Spark
jobs and microservices
orchestrated by K8
i.e. Time-series visualization
with Grafana and data driven
GraphQL API for interacting
with the Monitoring Service