Productionizing real-time ML models poses unique data engineering challenges for enterprises that are coming from batch-oriented analytics. Enterprise data, which has traditionally been centralized in data warehouses and optimized for BI use cases, must now be transformed into features that provide meaningful predictive signals to our ML models.
detection and classification of knee osteoarthritis.pptx
Accelerating the ML Lifecycle with an Enterprise-Grade Feature Store
1. Accelerating the ML lifecycle
with an enterprise-grade Feature Store
Mike Del Balso, Tecton
Geoff Sims, Atlassian
Spark+AI Summit
Wednesday, 6/24 3:40pm PDT
1
2. 2
Product Manager / Ads ML
Product Manager / Created Michelangelo Platform
Co-founder and CEO / Data Platform for ML
2
About Us
Mike Del Balso
Principal Data Scientist
Geoff Sims
3. Machine learning today often falls short of its potential
Limited predictive data Long development cycles Painful path to Production
3
4. Fraud detection
CTR Prediction
Pricing
Customer support
Recommendation
Search
Segmentation
Chat bots
Common Use CasesOperational ML
powers automated
business decisions and
customer experiences
at scale
Characteristics
4
Customer-facing impact
Time-sensitive
Production SLAs
Subject to Regulation
Cross-functional stakeholders
5. Data Collection
Data Verification
Feature Engineering
Testing and
Debugging
Resource
Management
Metadata Management
Process Management
Serving
Infrastructure
Monitoring
Data Collection
Data Verification
Feature Engineering
Testing and
Debugging
Resource
Management
Metadata Management
Process Management
Serving
Infrastructure
Monitoring
Building Operational ML applications is very complex.
Data is at the core of that complexity.
5
Configuration
Automation
Data-Related
ML
Code
Model Analysis
Sculley, David, et al. "Machine learning: The high interest credit card of technical debt." (2014).
6. Features are the signals we extract from data and
are a critical part of any ML application.
6
“Applied machine learning is basically feature engineering.”
– Andrew Ng
user_id user_clicks_last7d user_in_target_cou... ...
1001 2 1 ...
1002 13 1 ...
1003 0 0 ...
... ... ... ...
7. Tooling for managing features is almost non-existent
7
Models
MLOps PipelineModel Training
Model Serving
DevelopmentApps Production
DevOps Pipeline
RunBuild
Development
Features /
Data
Feature 1Feature 1
...
? ...
22
Feature Engineering Feature Serving
8. Tecton is a data platform for ML applications
Faster development cycles
Lower time-to-production
Lower operational cost
Easier adoption of ML across teams
8
9. Critical problems solved by data platforms for ML:
9
Managing sprawling
feature transform logic
Building high-quality
training sets from messy data
Deploying to production
& moving beyond batch to
real-time
10. 10
Managing sprawling and disconnected feature transform logic
Challenge 1
Features ModelRaw Data
Features are some of the most highly curated and refined data in a business,
yet they are also some of the most poorly managed assets
13. Solution
Easily contribute new features
to the feature store
13
1. Define a feature 2. Save it to the feature store
feature_repo/ads_behavior/user_clicks.py $ tecton apply
Tecton CLI
15. Common challenges assembling training data
Stitching multiple data pipelines / data sources
together
Feature backfills
Time travel + point-in-time correctness
Data leakage
Delivering training data to training jobs
15
Building
high-quality
training sets from
messy data
Challenge 2
16. 2. Get training data for the events of interest
1. Configure what features you want in a training dataset
16
Solution
Configuration-based
training data set
generation through
simple APIs
17. user_id ad_id timestamp
111 444 2020-02-01...
222 555 2020-02-02...
333 666 2020-02-01...
... ... ...
distinct_users user_clicks_last7d user_in_target_cou... ...
24 2 1 ...
21 13 1 ...
20 0 0 ...
... ... ... ...
Solution
Built-in row-level time travel for accurate training data
17
1. Modeler provides event time
and entity IDs
2. Tecton returns point-in-time
correct feature values
18. Common challenges when deploying to
production
Throwing it over the wall for reimplementation in
production environment
Infrastructure provisioning
Freshness vs. cost
Train-serve skew
Drift & data quality monitoring
18
Deploying to
production & moving
from batch to real-time
Challenge 3
23. Unified train/predict pipelines ensure online/offline consistency
23
Solution
Training
Features Training
PredictionServing
FeaturesProduction
Data
Raw Data
Training + Serving
Implementation
24. Tecton delivers those features “online” for real-time predictions
24
Solution
● Single row
● Delivered in milliseconds
● REST API
Training
Features Training
PredictionServing
Features
● Millions of rows
● Dataframe API
Virtual Data
Source
Training + Serving
Implementation