No REST till Production – Building and Deploying 9 Models to Production in 3 weeks

WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics

Charmee Patel, Syntasa
No REST till Production –
Building and Deploying 9 Models to
Production in 3 weeks
#UnifiedDataAnalytics #SparkAISummit

3
London
Washington, DC
• Offices in Washington, DC and London
• Marketing AI Platform used by large Enterprises
• Fit natively in all Hadoop distros & Clouds
• Customers include several household brands
About SYNTASA

4
About SYNTASA
• 50+ production models
• 100s of behavioural data sources
• 100s of experimental models
• ~1B unique visitors and customer activities
• 30B Million events monthly
• Billions of predictions served
• Trillions of historical records

Why care about behavioural data?
5
• Media optimisation
• Recommendation
• Fraud detection
• Churn reduction
Company
Mobile
Web
IVR
email
CRM
Financials
ERP

~2M Visitors
~100k SKUs
Our Christmas Project
Support media buying decisions for certain product segments
6#UnifiedDataAnalytics #SparkAISummit
Background
• Clickstream data
• ~2M visitors a day
• ~100k SKUs
• Products of interest
– <0.1% conversion rate
<0.1%
conversion rate
Existing Marketing activity
• Building rules-based audiences
• Using black-box AI models in their
Martech and Adtech tools
We built bespoke models using their behavioral + enterprise data

Challenges
• High volume
• Complex
• Non-stationary
• Hard to featurise
• Training requires the full data
• Reliability in productionizing model
• Timely inference at scale
• Models drift
7

Prediction
Prediction
User Activity & Time
Prediction
Prediction
Lookback Window
Lookback Window
Lookback Window
Lookback Window
1 2 3 4 5 6 7 8

Feature Store
Features @ Visitor level
• Last 7 days
• Interaction with certain pages, products, cart
• ~400 form elements that were available in tracking
• Total general activity
• Features include zero and non-zero counts of fields and one-hot encoded values
Initial ~1,000 features, down-weighing features based on variance
resulting in ~400 features

Experiment setup
3 datasets
• Training period Nov 2018
Split in test & train
• Additional evaluation on Dec 2018
Statistical Metrics
• F1score due to class imbalance
10
Business Metrics
• If we have a good model but what does that mean
for campaign?
• Campaigns need minimum sample size for A/B testing
• How do we find right audience and confirm projected
positive results for audience
• Lift projections
– Lift @ 5%
– Lift @ 20%

Accelerating Experimentation
11

Abstract Away Design Patterns
12

Process Template
Dataset à Processes à Dataset
• aka Functors
Why Processes?
• UDFs/UDAFs not always the right fit
• Custom transformers on top of Spark transform is too cumbersome
• Abstracts away Spark idiosyncrasies
• Allows re-use by team members of different skill levels
• Battle tested and unit tested

Experiments
Multiclass model X
• Severe class imbalance (<0.1%)
• Poor learning and evaluation metrics
What if we build several
binary models?
• Initial results promising
Several algorithms and hyper
params tested (LR, RF, GBM)
14
First best model results – Random forest
• Learning (f1score) – 0.9
• Eval on test split (f1score) – 0.85
• Eval on December – 0.7!!
• Lift @ 5% - 9.5x
Next best model results – Logistic Regression
• Learning (f1score) – 0.89
• Eval on test split (f1score) – 0.87
• Eval on December – 0.78
• Lift @ 5% – 9x

Production
• Several Models for each Product
• Ensemble predictions for each product separately
• Call REST API to push predictions @ scale to Ad Networks

Campaign Results
0%
50%
100%
150%
200%
250%
CTR Conver sion Rate
Performance Comparison
Rule-bas ed Algo-Bespoke Algo-MC
0%
20%
40%
60%
80%
100%
120%
Impres sions Clic ks Conver sions
Marketing Activity Share
Rule-bas ed Algo-Bespoke Algo-MC

DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

No REST till Production – Building and Deploying 9 Models to Production in 3 weeks

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie No REST till Production – Building and Deploying 9 Models to Production in 3 weeks

Ähnlich wie No REST till Production – Building and Deploying 9 Models to Production in 3 weeks (20)

Mehr von Databricks

Mehr von Databricks (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

No REST till Production – Building and Deploying 9 Models to Production in 3 weeks