Real-time applications of predictive models must be able to generate predictions at the rate that transactions are generated. Previously, such applications of models trained using R needed to be converted to other languages like C++ or Java to achieve the required throughput. In this talk, I’ll describe how to use the in-database R processing capabilities of Microsoft R Server to detect fraud in a SQL Server database of loan records at a rate exceeding one million transactions per second. I will also show the process of training the underlying gradient-boosted tree model on a large training set using the out-of-memory algorithms of Microsoft R.
4. Generating Predictions
Batch Mode
• Create many (millions!) of predictions at once
• Time required proportional to number of predictions
Real Time
• Only a few (maybe only one!) data point available to predict
– There may be multiple requests in a short timeframe
• Latency the key metric here
– Many applications require sub-second latency at endpoint
4
5. Real-Time Operationalization Options
• Rewrite prediction code in some other language
– PMML / C++ / Java / …
• OR, use your R code:
– Deploy as a web service with Microsoft R Server
– Deploy as a stored procedure in SQL Server
5
6. Lending Club Loan Performance Data
• www.lendingclub.com/info/download-data.action
– Feature selection and generation: aka.ms/lendingclub
6
LoanStatNew Description
all_util Balance to credit limit on all trades
annual_inc_joint The combined self-reported annual income provided by the co-borrowers during
registration
dti_joint A ratio calculated using the co-borrowers' total monthly payments on the total debt
obligations, excluding mortgages and the requested LC loan, divided by the co-
borrowers' combined self-reported monthly income
int_rate Interest Rate on the loan
mths_since_last_record The number of months since the last public record.
revol_util Revolving line utilization rate, or the amount of credit the borrower is using relative
to all available revolving credit.
total_rec_prncp Principal received to date
is_bad (generated) Late > 16 days, Default, or Charged Off
7. Operationalization with Microsoft R Server
Data Scientist
Developer
Integration
Swagger API Service
Consume with any
programming language
Deployment
Publish R function into
web services
Configuration
Data Science Virtual Machine
Azure GS5 Instance
32 cores
448Gb RAM
Microsoft R Server
configured for
operationalizing R analytics
Microsoft R Client
(mrsdeploy package)
Quant
Consumption
Explore and consume
services in R directly
publishServiceMicrosoft R Client
(mrsdeploy package)
IT Administator
8. Flexible vs Real-Time Deployment
Flexible Deployment
Publish R as Web Service
• Any R function or package
• R interpreter runs on-demand in
Swagger via REST API
Real-Time Deployment
Publish R model object
• RevoScaleR or MicrosoftML models
• Prediction engine generates scores
from data via REST API
8
library(mrsdeploy)
publishService(
serviceType='Script',
Code=<<R script or function>>)
library(mrsdeploy)
publishService(
serviceType='RealTime',
model=<<R object>>)
9. Real-Time Deployment Models
Linear Regression (rxLinMod, rxFastLinear)
Logistic Regression (rxLogit, rxLogisticRegression)
Classification / Regression trees (rxDTree, rxFastTrees)
Classification / Regression forests (rxDForest, rxFastForest)
Stochastic gradient-boosted decision trees (rxBTrees)
One-class Support Vector Machines (rxOneClassSvm)
Convolutional Neural Networks (rxNeuralNet)
Also: pre-trained models for text sentiment and image featurization
9
10. FLEXIBLE AND REAL-TIME SCORING WITH
MICROSOFT R SERVER
Demonstration
Server: Azure Data Science Virtual Machine, Azure GS5 instance (32 cores, 448 GB memory)
Client: SurfaceBook / Microsoft R Client
10
19. Operationalization Overview
Platform Flexible Operationalization
• Any R Function / Package
Real-Time Operationalization
• Specific RevoScaleR / MicrosoftML models
SQL Server EXEC sp_execute_external_script
@language = N'R',
@script = N'<<R script>>'
EXEC sp_rxPredict
@model=<<serialized R object>>
@inputData=<<SQL query>>
Microsoft R
Server
library(mrsdeploy)
publishService(
serviceType='Script',
Code=<<R script or function>>)
library(mrsdeploy)
publishService(
serviceType='RealTime',
model=<<R object>>)
21
• Use Microsoft R Server 9+ or SQL Server 2016+ as the deployment server
• Flexible Operationalization supports any R code / package
• Real-Time Operationalization supports Microsoft R models with improved latency
20. Thank You!
David Smith @revodavid
R Community Lead, Microsoft
Special thanks:
Pratik Palnitkar, Microsoft
Arun Gurunathan, Microsoft
Download Microsoft R Client: aka.ms/rclient
Data Science Virtual Machine: aka.ms/dsvm
Hinweis der Redaktion
Source: https://msdn.microsoft.com/en-us/microsoft-r/operationalize/data-scientist-manage-services#publish-web-services
Have a model object that was created with following supported functions:
From RevoScaleR package, these specific functions: rxLogit, rxLinMod, rxBTrees, rxDTree, and rxDForestfunctions
From MicrosoftML package, only the machine learning tasks and transform tasks functions, which include rxFastTrees, rxFastForest, rxLogisticRegression, rxOneClassSvm, rxNeuralNet, rxFastLinear, featurizeText, concat, categorical, categoricalHash, selectFeatures, featurizeImage, getSentiment, loadimage, resizeImage, extractPixels, selectColumns, and dropColumns
https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-azure-ml-netsharp-reference-guide
Remote client, 10 threads with a payload of 100 predictions
Remote client, 10 threads with a payload of 100 predictions