12. 2012 2015
30 - 40 models
levering billions of events
Creating 100 million + scores
over 1000 models
‘leveraging’ trillions of events
Creating 150 billion+ scores / day
The Challenge
13. A system creates as many models as we want, when
we want them, that dynamically adapts in real-time
to changing conditions
▪ Automatically creates, validates, ships, and
monitors models, with a capacity that scales
to 10s of thousands of models
The Opportunity
What we really need:
15. Online models evolve &
adapt over time, in
reaction to a changing
environment with each
and every event
Given a complete
data set, a batch
model is created in
entirety all at once
Introducing Online Learning
Batch Online Learning
Creation Evolution
16. large-scale data
storage
large-scale
data movement
painful data
aggregation
lots of manual
everything
Harder to build models,
but easier to evaluate
limited data storage,
mostly for monitoring
event-level
data streams
light data
aggregation
lots of automatic
everything
Easier to build, but harder
to evaluate (& support)
Batch Models (Offline) vs. Online Learning
Online LearningBatch Models (Offline)
17. ● Outperformed both L2 and Elastic Net
● Leverages small (‘micro’) batches
● Validates and monitors models in real time
● Alerts team when models are not behaving
Some Techno Mumbo Jumbo
Stochastic gradient descent with L1 regularization
19. eXpresso Serving Cluster
10B+ events/day
300+ nodes across
4 data centers
eXtream Modeling Cluster
160B models/day
100+ nodes across
4 data centers
JGroups
Distribute
d
Messagin
g
Serving Layer
21. Online LearningBatch Models (Offline)
Batch
Predefined ratio
Predefined feature selection
One time Validation
Streaming
Downsampling
Automated feature selection
Ongoing data cleaning
Ongoing validation
The Online Learning Challenge
22. ● All necessary data already exists in eXtream
● The cluster’s processing resources can be better utilized
● eXtream addresses most performance / scalability requirements
● Scoring mechanism already exists
eXtream as a Framework for Online Learning
Why it works...
27. ● Sliding window of recent events
● 60/40 not-converted/converted ratio
● Various accuracy metrics (lift, precision, recall, confusion matrix)
● Decide if the model is ready for making predictions
Model Validation
28. ● Two phases (Scoring, Re-code)
● Scale vs Accuracy tradeoff
Predictions Mechanism