SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
Charmee Patel, Syntasa
No REST till Production –
Building and Deploying 9 Models to
Production in 3 weeks
#UnifiedDataAnalytics #SparkAISummit
3
London
Washington, DC
• Offices in Washington, DC and London
• Marketing AI Platform used by large Enterprises
• Fit natively in all Hadoop distros & Clouds
• Customers include several household brands
About SYNTASA
4
About SYNTASA
• 50+ production models
• 100s of behavioural data sources
• 100s of experimental models
• ~1B unique visitors and customer activities
• 30B Million events monthly
• Billions of predictions served
• Trillions of historical records
Why care about behavioural data?
5
• Media optimisation
• Recommendation
• Fraud detection
• Churn reduction
Company
Mobile
Web
IVR
email
CRM
Financials
ERP
~2M Visitors
~100k SKUs
Our Christmas Project
Support media buying decisions for certain product segments
6#UnifiedDataAnalytics #SparkAISummit
Background
• Clickstream data
• ~2M visitors a day
• ~100k SKUs
• Products of interest
– <0.1% conversion rate
<0.1%
conversion rate
Existing Marketing activity
• Building rules-based audiences
• Using black-box AI models in their
Martech and Adtech tools
We built bespoke models using their behavioral + enterprise data
Challenges
• High volume
• Complex
• Non-stationary
• Hard to featurise
• Training requires the full data
• Reliability in productionizing model
• Timely inference at scale
• Models drift
7
Prediction
Prediction
User Activity & Time
8#UnifiedDataAnalytics #SparkAISummit
Prediction
Prediction
Lookback Window
Lookback Window
Lookback Window
Lookback Window
1 2 3 4 5 6 7 8
Feature Store
Features @ Visitor level
• Last 7 days
• Interaction with certain pages, products, cart
• ~400 form elements that were available in tracking
• Total general activity
• Features include zero and non-zero counts of fields and one-hot encoded values
Initial ~1,000 features, down-weighing features based on variance
resulting in ~400 features
9#UnifiedDataAnalytics #SparkAISummit
Experiment setup
3 datasets
• Training period Nov 2018
Split in test & train
• Additional evaluation on Dec 2018
Statistical Metrics
• F1score due to class imbalance
10
Business Metrics
• If we have a good model but what does that mean
for campaign?
• Campaigns need minimum sample size for A/B testing
• How do we find right audience and confirm projected
positive results for audience
• Lift projections
– Lift @ 5%
– Lift @ 20%
Accelerating Experimentation
11
Abstract Away Design Patterns
12
Process Template
Dataset à Processes à Dataset
• aka Functors
Why Processes?
• UDFs/UDAFs not always the right fit
• Custom transformers on top of Spark transform is too cumbersome
• Abstracts away Spark idiosyncrasies
• Allows re-use by team members of different skill levels
• Battle tested and unit tested
13#UnifiedDataAnalytics #SparkAISummit
Experiments
Multiclass model X
• Severe class imbalance (<0.1%)
• Poor learning and evaluation metrics
What if we build several
binary models?
• Initial results promising
Several algorithms and hyper
params tested (LR, RF, GBM)
14
First best model results – Random forest
• Learning (f1score) – 0.9
• Eval on test split (f1score) – 0.85
• Eval on December – 0.7!!
• Lift @ 5% - 9.5x
Next best model results – Logistic Regression
• Learning (f1score) – 0.89
• Eval on test split (f1score) – 0.87
• Eval on December – 0.78
• Lift @ 5% – 9x
Production
• Several Models for each Product
• Ensemble predictions for each product separately
• Call REST API to push predictions @ scale to Ad Networks
15#UnifiedDataAnalytics #SparkAISummit
Overall App Flow
16
Campaign Results
17#UnifiedDataAnalytics #SparkAISummit
0%
50%
100%
150%
200%
250%
CTR Conver sion Rate
Performance Comparison
Rule-bas ed Algo-Bespoke Algo-MC
0%
20%
40%
60%
80%
100%
120%
Impres sions Clic ks Conver sions
Marketing Activity Share
Rule-bas ed Algo-Bespoke Algo-MC
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

Weitere ähnliche Inhalte

Was ist angesagt?

Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine Learning
Databricks
 

Was ist angesagt? (20)

Automated Production Ready ML at Scale
Automated Production Ready ML at ScaleAutomated Production Ready ML at Scale
Automated Production Ready ML at Scale
 
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
 
SparkML: Easy ML Productization for Real-Time Bidding
SparkML: Easy ML Productization for Real-Time BiddingSparkML: Easy ML Productization for Real-Time Bidding
SparkML: Easy ML Productization for Real-Time Bidding
 
AI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with DatabricksAI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with Databricks
 
Everyday Probabilistic Data Structures for Humans
Everyday Probabilistic Data Structures for HumansEveryday Probabilistic Data Structures for Humans
Everyday Probabilistic Data Structures for Humans
 
Anomaly Detection at Scale!
Anomaly Detection at Scale!Anomaly Detection at Scale!
Anomaly Detection at Scale!
 
Big Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al EssaBig Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al Essa
 
Tactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark TogetherTactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark Together
 
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopPractical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
 
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
 
Spark and the Future of Advanced Analytics by Thomas Dinsmore
Spark and the Future of Advanced Analytics by Thomas DinsmoreSpark and the Future of Advanced Analytics by Thomas Dinsmore
Spark and the Future of Advanced Analytics by Thomas Dinsmore
 
Big Data, Bigger Analytics
Big Data, Bigger AnalyticsBig Data, Bigger Analytics
Big Data, Bigger Analytics
 
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine Learning
 
Bootstrapping of PySpark Models for Factorial A/B Tests
Bootstrapping of PySpark Models for Factorial A/B TestsBootstrapping of PySpark Models for Factorial A/B Tests
Bootstrapping of PySpark Models for Factorial A/B Tests
 
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
 
CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache S...
CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache S...CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache S...
CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache S...
 
Delight: An Improved Apache Spark UI, Free, and Cross-Platform
Delight: An Improved Apache Spark UI, Free, and Cross-PlatformDelight: An Improved Apache Spark UI, Free, and Cross-Platform
Delight: An Improved Apache Spark UI, Free, and Cross-Platform
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 

Ähnlich wie No REST till Production – Building and Deploying 9 Models to Production in 3 weeks

Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
Vivastream
 
Building a Complete View Across the Customer Experience on Oracle BICS
Building a Complete View Across the Customer Experience on Oracle BICSBuilding a Complete View Across the Customer Experience on Oracle BICS
Building a Complete View Across the Customer Experience on Oracle BICS
Shiv Bharti
 
BizTrans SysTech_Analytics_Serv_SAP_v1.0
BizTrans SysTech_Analytics_Serv_SAP_v1.0BizTrans SysTech_Analytics_Serv_SAP_v1.0
BizTrans SysTech_Analytics_Serv_SAP_v1.0
BizTrans SysTech
 

Ähnlich wie No REST till Production – Building and Deploying 9 Models to Production in 3 weeks (20)

Driving Digital Transformation with Machine Learning in Oracle Analytics
Driving Digital Transformation with Machine Learning in Oracle AnalyticsDriving Digital Transformation with Machine Learning in Oracle Analytics
Driving Digital Transformation with Machine Learning in Oracle Analytics
 
How to Capitalize on Big Data with Oracle Analytics Cloud
How to Capitalize on Big Data with Oracle Analytics CloudHow to Capitalize on Big Data with Oracle Analytics Cloud
How to Capitalize on Big Data with Oracle Analytics Cloud
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
 
Implementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformImplementing Advanced Analytics Platform
Implementing Advanced Analytics Platform
 
A6 big data_in_the_cloud
A6 big data_in_the_cloudA6 big data_in_the_cloud
A6 big data_in_the_cloud
 
Ideal Customer Profile Guide
Ideal Customer Profile GuideIdeal Customer Profile Guide
Ideal Customer Profile Guide
 
Building a Complete View Across the Customer Experience on Oracle BICS
Building a Complete View Across the Customer Experience on Oracle BICSBuilding a Complete View Across the Customer Experience on Oracle BICS
Building a Complete View Across the Customer Experience on Oracle BICS
 
Big Data en Retail
Big Data en RetailBig Data en Retail
Big Data en Retail
 
How CROSSMARK Rapidly Deployed BI Solutions Across the Value Chain
How CROSSMARK Rapidly Deployed BI Solutions Across the Value ChainHow CROSSMARK Rapidly Deployed BI Solutions Across the Value Chain
How CROSSMARK Rapidly Deployed BI Solutions Across the Value Chain
 
BizTrans SysTech_Analytics_Serv_SAP_v1.0
BizTrans SysTech_Analytics_Serv_SAP_v1.0BizTrans SysTech_Analytics_Serv_SAP_v1.0
BizTrans SysTech_Analytics_Serv_SAP_v1.0
 
Move from Business Intelligence to Advanced Analytics by Integrating IBM SPSS...
Move from Business Intelligence to Advanced Analytics by Integrating IBM SPSS...Move from Business Intelligence to Advanced Analytics by Integrating IBM SPSS...
Move from Business Intelligence to Advanced Analytics by Integrating IBM SPSS...
 
How DMP Will Save Marketing - Myths, Truths and Best Practices
How DMP Will Save Marketing - Myths, Truths and Best PracticesHow DMP Will Save Marketing - Myths, Truths and Best Practices
How DMP Will Save Marketing - Myths, Truths and Best Practices
 
Agile BI success factors
Agile BI success factorsAgile BI success factors
Agile BI success factors
 
Application Modernization
Application ModernizationApplication Modernization
Application Modernization
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
 
Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...
Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...
Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...
 
Get your data analytics strategy right!
Get your data analytics strategy right!Get your data analytics strategy right!
Get your data analytics strategy right!
 
How to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data VisualizationHow to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data Visualization
 

Mehr von Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

Mehr von Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Kürzlich hochgeladen

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Kürzlich hochgeladen (20)

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 

No REST till Production – Building and Deploying 9 Models to Production in 3 weeks

  • 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  • 2. Charmee Patel, Syntasa No REST till Production – Building and Deploying 9 Models to Production in 3 weeks #UnifiedDataAnalytics #SparkAISummit
  • 3. 3 London Washington, DC • Offices in Washington, DC and London • Marketing AI Platform used by large Enterprises • Fit natively in all Hadoop distros & Clouds • Customers include several household brands About SYNTASA
  • 4. 4 About SYNTASA • 50+ production models • 100s of behavioural data sources • 100s of experimental models • ~1B unique visitors and customer activities • 30B Million events monthly • Billions of predictions served • Trillions of historical records
  • 5. Why care about behavioural data? 5 • Media optimisation • Recommendation • Fraud detection • Churn reduction Company Mobile Web IVR email CRM Financials ERP
  • 6. ~2M Visitors ~100k SKUs Our Christmas Project Support media buying decisions for certain product segments 6#UnifiedDataAnalytics #SparkAISummit Background • Clickstream data • ~2M visitors a day • ~100k SKUs • Products of interest – <0.1% conversion rate <0.1% conversion rate Existing Marketing activity • Building rules-based audiences • Using black-box AI models in their Martech and Adtech tools We built bespoke models using their behavioral + enterprise data
  • 7. Challenges • High volume • Complex • Non-stationary • Hard to featurise • Training requires the full data • Reliability in productionizing model • Timely inference at scale • Models drift 7
  • 8. Prediction Prediction User Activity & Time 8#UnifiedDataAnalytics #SparkAISummit Prediction Prediction Lookback Window Lookback Window Lookback Window Lookback Window 1 2 3 4 5 6 7 8
  • 9. Feature Store Features @ Visitor level • Last 7 days • Interaction with certain pages, products, cart • ~400 form elements that were available in tracking • Total general activity • Features include zero and non-zero counts of fields and one-hot encoded values Initial ~1,000 features, down-weighing features based on variance resulting in ~400 features 9#UnifiedDataAnalytics #SparkAISummit
  • 10. Experiment setup 3 datasets • Training period Nov 2018 Split in test & train • Additional evaluation on Dec 2018 Statistical Metrics • F1score due to class imbalance 10 Business Metrics • If we have a good model but what does that mean for campaign? • Campaigns need minimum sample size for A/B testing • How do we find right audience and confirm projected positive results for audience • Lift projections – Lift @ 5% – Lift @ 20%
  • 12. Abstract Away Design Patterns 12
  • 13. Process Template Dataset à Processes à Dataset • aka Functors Why Processes? • UDFs/UDAFs not always the right fit • Custom transformers on top of Spark transform is too cumbersome • Abstracts away Spark idiosyncrasies • Allows re-use by team members of different skill levels • Battle tested and unit tested 13#UnifiedDataAnalytics #SparkAISummit
  • 14. Experiments Multiclass model X • Severe class imbalance (<0.1%) • Poor learning and evaluation metrics What if we build several binary models? • Initial results promising Several algorithms and hyper params tested (LR, RF, GBM) 14 First best model results – Random forest • Learning (f1score) – 0.9 • Eval on test split (f1score) – 0.85 • Eval on December – 0.7!! • Lift @ 5% - 9.5x Next best model results – Logistic Regression • Learning (f1score) – 0.89 • Eval on test split (f1score) – 0.87 • Eval on December – 0.78 • Lift @ 5% – 9x
  • 15. Production • Several Models for each Product • Ensemble predictions for each product separately • Call REST API to push predictions @ scale to Ad Networks 15#UnifiedDataAnalytics #SparkAISummit
  • 17. Campaign Results 17#UnifiedDataAnalytics #SparkAISummit 0% 50% 100% 150% 200% 250% CTR Conver sion Rate Performance Comparison Rule-bas ed Algo-Bespoke Algo-MC 0% 20% 40% 60% 80% 100% 120% Impres sions Clic ks Conver sions Marketing Activity Share Rule-bas ed Algo-Bespoke Algo-MC
  • 18. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT