2. ML Operational Capabilities
Business
Value
Online predictions
Batch updates
Offline predictions
Batch updates
Traditional
Analytics
Training/Test Data
Analytical ML
Operational ML
Real-Time
Machine Learning
Where business value is generated in AI
Online inference
Batch features
Offline inference
Batch features
Model Serving
Online Feature Store
Batch jobs
Offline Feature Store
Model Serving
Online Feature Store
Online inference
Streaming features
Online predictions
Real-time updates
4. Feature
Groups
Feature
Views
Batch
(DataFrames)
Read Feature Vectors
Online API
Read Files/DataFrames
Offline API
Streaming
(Data Instances)
Models
Feature Store
Transformer Prediction
Service
Predictor
Model
Artifact
Online Predictions
REST API
Model Registry
Deploy
Inference logs
(Data Instances)
Model Serving
Code
Model
files
Model Server
Inference Logger
From Raw Data to Online Predictions
Search, Versioning, Statistics, Transformations
Lineage, Provenance
Versioning, Experiments, Metrics, Code Canary, A/B Testing
5. Keeping Your Pipelines on Track
Model
Registry
Batch Apps
Online Apps
Feature Groups
Feature Views
Vector DB
Training
Pipelines
Inference
Pipelines
Online
Offline
Model
artifact
Index Creation
Encoder
schema
transformation
functions
versioning
versioning
versioning
experiments
versioning
schema
schema
schema
✓ Versioning →
■ code : feature eng., transformation functions, model training, model serving scripts
■ assets: model files, model artifacts, experiments
■ configuration: experiment settings, deployments, indexes
✓ Schema management → columns, data types // fg, fv, models, deployments
✓ Transformation functions → avoid training / serving skew
✓ Provenance and Lineage → track predictions down to the ingested features
Provenance
versioning
Data warehouse
(historical data)
Applications, Service
(context, trends)
Feature
Pipelines
Batch
Streaming
versioning
6. A Closer Look to Inference Pipelines
Data warehouse
(historical data)
Model
Registry
Batch Apps
Online Apps
Feature Groups
Feature Views
Applications, Service
(context, trends)
Feature
Pipelines
Vector DB
Batch
Streaming
Training
Pipelines
Offline
Index Creation
Encoder
Model artifact
Batch Inference Jobs
Prediction Service
Transformer
Predictor
Model artifact
Online
Recent
features
Embeddings
Online
predictions
Inference logs
Inference logger
Batch data
Batch
predictions
7. Feature Store
Inference Request
Streaming
Feature Pipeline
Feature Group
FG 1
FG 2
FG 4
FG 3
Feature View
FV 1
FV 2
FG 5
FV 3
Features
Feature 1
Feature 2
Feature 4 (pk)
Feature 3
Feature 5
Feature 6
Feature 7 (pk)
Feature 9
Feature 8
Model Serving
Transformer
Feature 4 (pk)
Feature Vector
Vector DB
Embedding
Embedding
Embedding
Embeddings
Predictor
Embeddings
Model Input
Inference Response
Prediction
Prediction
Embedding space
Online Apps
Similarity
search
Feedback
Lookups
Inference logs
Model
A Deeper Look to Real-time Inference Pipelines
mapping
10. What about Multi-Modal Similarity Search?
Can a “user query” find “items”
with similarity search?
Yes, by mapping the “user query” embedding
into the “item” embedding space with a
two-tower model.
Representation learning for retrieval usually involves supervised learning with labeled or
pseudo-labeled data from user-item interactions.
11. Training data for our Two-Tower Model will be User-Item Interactions
Log user-item interactions as training data for our two-tower model and ranking model.
Retail Website
Search
Item 1
Item 2
Item 3
Item 4
Purchase 3
Click 2
Click 3
Score: 0
Item 1
Score: 1
Item 2
Score: 5
Item 3
Score: 0
Item 4
Features
Features
Features
Features
12. Training the Two-Tower Embedding Modoel
User Query
embeddings
User Query
encoder
Item
embeddings
Item encoder
Item category,
price, popularity,
etc
User features,
preferences,
history
Dot product
(Loss fn)
0 → Non-interaction
LOSS
1 → highest interaction
User-Item
Interactions
Training Data
13. Model Training for Embedding Models and Ranking Model
Feature Views
items
user queries
Feature Store
Training Data
retrieval.csv
ranking.csv
Ranking
User/Query
Embedding
Item
Embedding
Hopsworks Model Registry
Train Models Train Models
Models
item user clicks
14. Build the ANN Index on Items. Similarity Search with user queries on it.
OpenSearch k-NN
(ANN Index)
items.csv
Job computes
embeddings for all
Items
Encode all items
Insert all pairs
(item-ID, embedding)
15. Two-Tower Network with a Vector Database for ANN Search
Source: https://cloud.google.com/blog/products/ai-machine-learning/vertex-matching-engine-blazing-fast-and-massively-scalable-nearest-neighbor-search
16. Retrieval and Ranking for Personalized Real-time Recommendation Systems
User-Query
Embedding
User-Query
Encoder
Features
Candidate
Retrieval
Ranking
Model
Ranked items
Hopsworks
Feature Store
OpenSearch k-NN
(items)
Candidate
Items
Trends,
Feedback
Search
Get
Features
for
items
Features
17. Real-time Recommendation Systems
Query
Model
Retrieve closest
candidates using
similarity search
Enrich with
features for
candidates
Ranked
candidates
Recommended
candidates
Ranking
Model
Candidate 1
Candidate 2
Candidate N
Recommendation
request
Enrich with
item/user features
18. Real-time Recommendation Systems with Hopsworks
User
Query
Model
Retrieve closest
candidates with
similarity search
Enrich with
features for
candidates
Recommendation
request
Recommended
candidates
Enrich with
item/user features
Ranking
Model
Ranked
candidates
Candidate 1
Candidate 2
Candidate N
Hopsworks Feature Store
Predictor Predictor
KServe
Deployment
OpenSearch K-NN
KServe
Deployment
Transformer
Transformer
19. Extended Retrieval and Ranking Architecture
Embeddings, Retrieval, Filtering, Ranking
Jointly train with
two-tower model:
User/query embedding
Item embedding models
Built Approx Nearest
Neighbor (ANN) Index
with items and item
embedding model.
User/Query &
Item Embeddings
With a ranking model,
score all the candidate
items with both user
and item features,
ensuring, candidate
diversity.
Ranking
Remove candidate
items for various
reasons:
• underage user
• item sold out
• item bought
before
• item not available
in user’s region
Filtering
Retrieve candidate
items based on the user
embedding from the
ANN Index -
similarity search
Retrieval