Charmee Patel from Syntasa discusses building and deploying 9 models to production in 3 weeks to support media buying decisions for certain product segments using clickstream and enterprise data from ~2M visitors and ~100K SKUs. Key challenges included high data volume, complexity, non-stationarity, and reliability in production. Syntasa addressed this through an experiment process template, feature store, and ensemble modeling approach. Results showed significant lift over rule-based approaches, with the bespoke algorithmic models driving much higher conversion rates and marketing activity.
2. Charmee Patel, Syntasa
No REST till Production –
Building and Deploying 9 Models to
Production in 3 weeks
#UnifiedDataAnalytics #SparkAISummit
3. 3
London
Washington, DC
• Offices in Washington, DC and London
• Marketing AI Platform used by large Enterprises
• Fit natively in all Hadoop distros & Clouds
• Customers include several household brands
About SYNTASA
4. 4
About SYNTASA
• 50+ production models
• 100s of behavioural data sources
• 100s of experimental models
• ~1B unique visitors and customer activities
• 30B Million events monthly
• Billions of predictions served
• Trillions of historical records
5. Why care about behavioural data?
5
• Media optimisation
• Recommendation
• Fraud detection
• Churn reduction
Company
Mobile
Web
IVR
email
CRM
Financials
ERP
6. ~2M Visitors
~100k SKUs
Our Christmas Project
Support media buying decisions for certain product segments
6#UnifiedDataAnalytics #SparkAISummit
Background
• Clickstream data
• ~2M visitors a day
• ~100k SKUs
• Products of interest
– <0.1% conversion rate
<0.1%
conversion rate
Existing Marketing activity
• Building rules-based audiences
• Using black-box AI models in their
Martech and Adtech tools
We built bespoke models using their behavioral + enterprise data
7. Challenges
• High volume
• Complex
• Non-stationary
• Hard to featurise
• Training requires the full data
• Reliability in productionizing model
• Timely inference at scale
• Models drift
7
9. Feature Store
Features @ Visitor level
• Last 7 days
• Interaction with certain pages, products, cart
• ~400 form elements that were available in tracking
• Total general activity
• Features include zero and non-zero counts of fields and one-hot encoded values
Initial ~1,000 features, down-weighing features based on variance
resulting in ~400 features
9#UnifiedDataAnalytics #SparkAISummit
10. Experiment setup
3 datasets
• Training period Nov 2018
Split in test & train
• Additional evaluation on Dec 2018
Statistical Metrics
• F1score due to class imbalance
10
Business Metrics
• If we have a good model but what does that mean
for campaign?
• Campaigns need minimum sample size for A/B testing
• How do we find right audience and confirm projected
positive results for audience
• Lift projections
– Lift @ 5%
– Lift @ 20%
13. Process Template
Dataset à Processes à Dataset
• aka Functors
Why Processes?
• UDFs/UDAFs not always the right fit
• Custom transformers on top of Spark transform is too cumbersome
• Abstracts away Spark idiosyncrasies
• Allows re-use by team members of different skill levels
• Battle tested and unit tested
13#UnifiedDataAnalytics #SparkAISummit
14. Experiments
Multiclass model X
• Severe class imbalance (<0.1%)
• Poor learning and evaluation metrics
What if we build several
binary models?
• Initial results promising
Several algorithms and hyper
params tested (LR, RF, GBM)
14
First best model results – Random forest
• Learning (f1score) – 0.9
• Eval on test split (f1score) – 0.85
• Eval on December – 0.7!!
• Lift @ 5% - 9.5x
Next best model results – Logistic Regression
• Learning (f1score) – 0.89
• Eval on test split (f1score) – 0.87
• Eval on December – 0.78
• Lift @ 5% – 9x
15. Production
• Several Models for each Product
• Ensemble predictions for each product separately
• Call REST API to push predictions @ scale to Ad Networks
15#UnifiedDataAnalytics #SparkAISummit