SlideShare a Scribd company logo
1 of 19
Download to read offline
A “Real-Time”
Architecture for
Machine Learning
Execution with
MLeap
Noah Pritikin, Site Reliability Engineer
Spark+AI Summit 2019 | April 24, 2019
Machine Learning Applications
Detecting credit-card fraud
Financial markets
Online advertising
Recommender systems
Robotics
…
Agriculture
Automated medical diagnosis
Computer vision
Insurance
Marketing
Sentiment analysis
User behavior analytics
Weather forecasting
…
I am defining “Real-Time” as <100ms for the context of this presentation.
Not “Real-Time” “Real-Time”
Agenda
What is Kount?
Data Pipeline Context
“Real-Time” Architecture / Model Governance
Statistical Metrics and Monitoring
Q&A
What is Kount?
Fighting Fraud, Boosting Revenue
Industry-Leading Technology & Experience
Developing fraud-fighting technology since 1999
AI/Machine Learning Implemented in 2007
Dozens of Patented Technologies
Continuous Innovation
A SaaS-Based, All-in-One Fraud Mitigation
Platform Safeguard Some of the World’s Largest
Merchants
Payment Service Providers
Ecommerce Platforms
$80M Investment from CVC Growth Partners
Data Pipeline Context
Data Pipeline Context
Highly-available Client-facing
Infrastructure / Services
Kount Data Lake
Data Science
Magical Fairy Dust!
Machine Learning Model
(MLeap Pipeline)
Machine Learning
Execution Platform
MLeap API Servers
“Real-Time” Architecture / Model Governance
First iteration was our baseline for improvement.
We were faced with a technical problem to solve…
Kount Boost Technology™ was released to production in October 2017.
First iteration of the architecture based on Python3 / Scikit-learn worked, but…
• Lacked portability
• Challenging to scale into the future
• Lacked multiple model support
• Limited model governance
Built in-house Apache Spark cluster in January 2018.
• Begin iterating on Boost Technology™ model improvements (e.g. feature engineering, tuning
model hyper parameters, etc.).
Spark ML-generated models depend on a SparkContext, but “real-time” predictions required!
“Real-Time” Architecture Overview
Feature Extraction separated from
Transaction Prediction
Hosting multiple models allow for blue-
green deployments
Centralized model governance
Load balancer deployed in a “sidecar
proxy” implementation allowing for
simpler Feature Extraction instance
design
• Backend health checks make a
prediction on a test transaction
MLeap API instances run GC-optimized
Java8 configuration
JVM metrics (e.g. Jolokia, etc.)
Dark Production Infrastructure
Dark Production Infrastructure
An entirely separate parallel infrastructure
in production
NO customer impact
NO “real-time” requirements
Parallelization is implemented via a
message bus (e.g. Kafka, Kinesis,
ZeroMQ, etc.)
Optimize cost through only processing a
fraction of production traffic (e.g. 1/3)
Only logs raw predictions that are
returned from MLeap for later analysis
Dark production infrastructure enables model governance / validation.
Tools Enabling Model Governance
Centrally track state of machine learning models – end-to-end!
Train model &
verify quality
Add model to
governance data
store
Deploy model to
dark production
infrastructure
MLeap API
instances
Dark
production
infrastructure
test?
Bad Deploy to available
production MLeap
API instances
Good
Migrate production
traffic to MLeap
API instances
hosting new model
Unload retired
model from MLeap
API instances
End
Replaced
model?
No
Yes
Statistical Metrics and Monitoring
“Real-Time” Architecture Performance – Transforming LEAP frames
This is NOT machine learning model performance (e.g. TOC curve, ROC
curve, PR curve, etc.)
“Real-Time” system requires metrics to measure the systemic performance.
+ Distributions!
Due to “real-time” requirements, averages don’t cut it (by themselves…)
Distributions provide critical visibility in monitoring low latency systems.
Averages
Applied Statistics
Boost without MLeap (previous)
Boost with MLeap (current)
Average 95th Percentile 99th Percentile Standard Deviation
19.27ms 24ms 37ms 5.31ms
Average 95th Percentile 99th Percentile Standard Deviation
7.00ms 9ms 16ms 2.41ms
– Improvement with MLeap!
99th percentile
saw a ~56%
improvement!
Consider Improvements to Your “Real-Time” Architecture!
MLeap…
Model governance…
Dark Production Infrastructure (assisting with model testing)…
Latency Metrics (emphasize the use of distributions)…
Further reading…
• “Deploying Apache Spark Supervised Machine Learning Models to
Production with MLeap” - https://medium.com/@combust/9e0fb57f79db
• MLeap GitHub repo - https://github.com/combust/mleap
• MLeap documentation - http://mleap-docs.combust.ml/
Thank you! … and, Q&A?

More Related Content

More from Databricks

Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Databricks
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Databricks
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks
 

More from Databricks (20)

Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
Improving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot InstancesImproving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot Instances
 
Importance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLowImportance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLow
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 

Recently uploaded

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Recently uploaded (20)

Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 

A “Real-Time” Architecture for Machine Learning Execution with MLeap

  • 1. A “Real-Time” Architecture for Machine Learning Execution with MLeap Noah Pritikin, Site Reliability Engineer Spark+AI Summit 2019 | April 24, 2019
  • 2. Machine Learning Applications Detecting credit-card fraud Financial markets Online advertising Recommender systems Robotics … Agriculture Automated medical diagnosis Computer vision Insurance Marketing Sentiment analysis User behavior analytics Weather forecasting … I am defining “Real-Time” as <100ms for the context of this presentation. Not “Real-Time” “Real-Time”
  • 3. Agenda What is Kount? Data Pipeline Context “Real-Time” Architecture / Model Governance Statistical Metrics and Monitoring Q&A
  • 5. Fighting Fraud, Boosting Revenue Industry-Leading Technology & Experience Developing fraud-fighting technology since 1999 AI/Machine Learning Implemented in 2007 Dozens of Patented Technologies Continuous Innovation A SaaS-Based, All-in-One Fraud Mitigation Platform Safeguard Some of the World’s Largest Merchants Payment Service Providers Ecommerce Platforms $80M Investment from CVC Growth Partners
  • 7. Data Pipeline Context Highly-available Client-facing Infrastructure / Services Kount Data Lake Data Science Magical Fairy Dust! Machine Learning Model (MLeap Pipeline) Machine Learning Execution Platform MLeap API Servers
  • 9. First iteration was our baseline for improvement. We were faced with a technical problem to solve… Kount Boost Technology™ was released to production in October 2017. First iteration of the architecture based on Python3 / Scikit-learn worked, but… • Lacked portability • Challenging to scale into the future • Lacked multiple model support • Limited model governance Built in-house Apache Spark cluster in January 2018. • Begin iterating on Boost Technology™ model improvements (e.g. feature engineering, tuning model hyper parameters, etc.). Spark ML-generated models depend on a SparkContext, but “real-time” predictions required!
  • 10. “Real-Time” Architecture Overview Feature Extraction separated from Transaction Prediction Hosting multiple models allow for blue- green deployments Centralized model governance Load balancer deployed in a “sidecar proxy” implementation allowing for simpler Feature Extraction instance design • Backend health checks make a prediction on a test transaction MLeap API instances run GC-optimized Java8 configuration JVM metrics (e.g. Jolokia, etc.)
  • 12. Dark Production Infrastructure An entirely separate parallel infrastructure in production NO customer impact NO “real-time” requirements Parallelization is implemented via a message bus (e.g. Kafka, Kinesis, ZeroMQ, etc.) Optimize cost through only processing a fraction of production traffic (e.g. 1/3) Only logs raw predictions that are returned from MLeap for later analysis Dark production infrastructure enables model governance / validation.
  • 13. Tools Enabling Model Governance Centrally track state of machine learning models – end-to-end! Train model & verify quality Add model to governance data store Deploy model to dark production infrastructure MLeap API instances Dark production infrastructure test? Bad Deploy to available production MLeap API instances Good Migrate production traffic to MLeap API instances hosting new model Unload retired model from MLeap API instances End Replaced model? No Yes
  • 15. “Real-Time” Architecture Performance – Transforming LEAP frames This is NOT machine learning model performance (e.g. TOC curve, ROC curve, PR curve, etc.) “Real-Time” system requires metrics to measure the systemic performance.
  • 16. + Distributions! Due to “real-time” requirements, averages don’t cut it (by themselves…) Distributions provide critical visibility in monitoring low latency systems. Averages
  • 17. Applied Statistics Boost without MLeap (previous) Boost with MLeap (current) Average 95th Percentile 99th Percentile Standard Deviation 19.27ms 24ms 37ms 5.31ms Average 95th Percentile 99th Percentile Standard Deviation 7.00ms 9ms 16ms 2.41ms – Improvement with MLeap! 99th percentile saw a ~56% improvement!
  • 18. Consider Improvements to Your “Real-Time” Architecture! MLeap… Model governance… Dark Production Infrastructure (assisting with model testing)… Latency Metrics (emphasize the use of distributions)… Further reading… • “Deploying Apache Spark Supervised Machine Learning Models to Production with MLeap” - https://medium.com/@combust/9e0fb57f79db • MLeap GitHub repo - https://github.com/combust/mleap • MLeap documentation - http://mleap-docs.combust.ml/
  • 19. Thank you! … and, Q&A?