This talk describes a production environment that hosts a large random forest model on a cluster of MLeap runtimes. A microservice architecture with a Postgres database backend manages configuration. The architecture provides full traceability and model governance through the entire lifecycle while cutting execution time by nearly 2/3rds. Kount provides certainty in digital interactions like online credit card transactions. Our production environment has extreme requirements for availability: we can process hundreds of transactions per second, have no scheduled downtime, and achieve 99.99% annual uptime. One of our scores uses a random forest classifier with 250 trees and 100,000 nodes per tree. Our original implementation serialized a scikit-learn model, which itself takes 1 GB in memory. It required exactly identical environments in training, where the model was serialized, and production, where it was deserialized and evaluated. This is risky when maintaining high uptime and no planned downtime. The improved solution load balances across a cluster of API servers hosting MLeap runtimes. These model execution runtimes scale separately from the data pre-processing pipeline, which is the more expensive step in our application. Each pre-processing application is connected to multiple MLeap runtimes to provide complete redundancy and independent scaling. We extend model governance into the production environment using a set of services wrapped around a Postgres backend. These services manage model promotion and role across several production, QA, and integration environments. Finally, we describe a "shadow" pipeline in production that can replace any or all portions of transaction evaluation with alternative models and software. A Kafka message bus provides copies of live production transactions to the shadow servers where results are logged for analysis. Since this shadow environment is managed through the same services, code and models can be directly promoted or retired after being test run on live data streams.
Speaker: Noah Pritikin
2. Machine Learning Applications
Detecting credit-card fraud
Financial markets
Online advertising
Recommender systems
Robotics
…
Agriculture
Automated medical diagnosis
Computer vision
Insurance
Marketing
Sentiment analysis
User behavior analytics
Weather forecasting
…
I am defining “Real-Time” as <100ms for the context of this presentation.
Not “Real-Time” “Real-Time”
3. Agenda
What is Kount?
Data Pipeline Context
“Real-Time” Architecture / Model Governance
Statistical Metrics and Monitoring
Q&A
5. Fighting Fraud, Boosting Revenue
Industry-Leading Technology & Experience
Developing fraud-fighting technology since 1999
AI/Machine Learning Implemented in 2007
Dozens of Patented Technologies
Continuous Innovation
A SaaS-Based, All-in-One Fraud Mitigation
Platform Safeguard Some of the World’s Largest
Merchants
Payment Service Providers
Ecommerce Platforms
$80M Investment from CVC Growth Partners
7. Data Pipeline Context
Highly-available Client-facing
Infrastructure / Services
Kount Data Lake
Data Science
Magical Fairy Dust!
Machine Learning Model
(MLeap Pipeline)
Machine Learning
Execution Platform
MLeap API Servers
9. First iteration was our baseline for improvement.
We were faced with a technical problem to solve…
Kount Boost Technology™ was released to production in October 2017.
First iteration of the architecture based on Python3 / Scikit-learn worked, but…
• Lacked portability
• Challenging to scale into the future
• Lacked multiple model support
• Limited model governance
Built in-house Apache Spark cluster in January 2018.
• Begin iterating on Boost Technology™ model improvements (e.g. feature engineering, tuning
model hyper parameters, etc.).
Spark ML-generated models depend on a SparkContext, but “real-time” predictions required!
10. “Real-Time” Architecture Overview
Feature Extraction separated from
Transaction Prediction
Hosting multiple models allow for blue-
green deployments
Centralized model governance
Load balancer deployed in a “sidecar
proxy” implementation allowing for
simpler Feature Extraction instance
design
• Backend health checks make a
prediction on a test transaction
MLeap API instances run GC-optimized
Java8 configuration
JVM metrics (e.g. Jolokia, etc.)
12. Dark Production Infrastructure
An entirely separate parallel infrastructure
in production
NO customer impact
NO “real-time” requirements
Parallelization is implemented via a
message bus (e.g. Kafka, Kinesis,
ZeroMQ, etc.)
Optimize cost through only processing a
fraction of production traffic (e.g. 1/3)
Only logs raw predictions that are
returned from MLeap for later analysis
Dark production infrastructure enables model governance / validation.
13. Tools Enabling Model Governance
Centrally track state of machine learning models – end-to-end!
Train model &
verify quality
Add model to
governance data
store
Deploy model to
dark production
infrastructure
MLeap API
instances
Dark
production
infrastructure
test?
Bad Deploy to available
production MLeap
API instances
Good
Migrate production
traffic to MLeap
API instances
hosting new model
Unload retired
model from MLeap
API instances
End
Replaced
model?
No
Yes
15. “Real-Time” Architecture Performance – Transforming LEAP frames
This is NOT machine learning model performance (e.g. TOC curve, ROC
curve, PR curve, etc.)
“Real-Time” system requires metrics to measure the systemic performance.
16. + Distributions!
Due to “real-time” requirements, averages don’t cut it (by themselves…)
Distributions provide critical visibility in monitoring low latency systems.
Averages
17. Applied Statistics
Boost without MLeap (previous)
Boost with MLeap (current)
Average 95th Percentile 99th Percentile Standard Deviation
19.27ms 24ms 37ms 5.31ms
Average 95th Percentile 99th Percentile Standard Deviation
7.00ms 9ms 16ms 2.41ms
– Improvement with MLeap!
99th percentile
saw a ~56%
improvement!
18. Consider Improvements to Your “Real-Time” Architecture!
MLeap…
Model governance…
Dark Production Infrastructure (assisting with model testing)…
Latency Metrics (emphasize the use of distributions)…
Further reading…
• “Deploying Apache Spark Supervised Machine Learning Models to
Production with MLeap” - https://medium.com/@combust/9e0fb57f79db
• MLeap GitHub repo - https://github.com/combust/mleap
• MLeap documentation - http://mleap-docs.combust.ml/