Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Raven: End-to-end Optimization of ML Prediction Queries

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 29 Anzeige

Raven: End-to-end Optimization of ML Prediction Queries

Herunterladen, um offline zu lesen

Machine learning (ML) models are typically part of prediction queries that consist of a data processing part (e.g., for joining, filtering, cleaning, featurization) and an ML part invoking one or more trained models. In this presentation, we identify significant and unexplored opportunities for optimization. To the best of our knowledge, this is the first effort to look at prediction queries holistically, optimizing across both the ML and SQL components.

We will present Raven, an end-to-end optimizer for prediction queries. Raven relies on a unified intermediate representation that captures both data processing and ML operators in a single graph structure.

This allows us to introduce optimization rules that
(i) reduce unnecessary computations by passing information between the data processing and ML operators
(ii) leverage operator transformations (e.g., turning a decision tree to a SQL expression or an equivalent neural network) to map operators to the right execution engine, and
(iii) integrate compiler techniques to take advantage of the most efficient hardware backend (e.g., CPU, GPU) for each operator.

We have implemented Raven as an extension to Spark’s Catalyst optimizer to enable the optimization of SparkSQL prediction queries. Our implementation also allows the optimization of prediction queries in SQL Server. As we will show, Raven is capable of improving prediction query performance on Apache Spark and SQL Server by up to 13.1x and 330x, respectively. For complex models, where GPU acceleration is beneficial, Raven provides up to 8x speedup compared to state-of-the-art systems. As part of the presentation, we will also give a demo showcasing Raven in action.

Machine learning (ML) models are typically part of prediction queries that consist of a data processing part (e.g., for joining, filtering, cleaning, featurization) and an ML part invoking one or more trained models. In this presentation, we identify significant and unexplored opportunities for optimization. To the best of our knowledge, this is the first effort to look at prediction queries holistically, optimizing across both the ML and SQL components.

We will present Raven, an end-to-end optimizer for prediction queries. Raven relies on a unified intermediate representation that captures both data processing and ML operators in a single graph structure.

This allows us to introduce optimization rules that
(i) reduce unnecessary computations by passing information between the data processing and ML operators
(ii) leverage operator transformations (e.g., turning a decision tree to a SQL expression or an equivalent neural network) to map operators to the right execution engine, and
(iii) integrate compiler techniques to take advantage of the most efficient hardware backend (e.g., CPU, GPU) for each operator.

We have implemented Raven as an extension to Spark’s Catalyst optimizer to enable the optimization of SparkSQL prediction queries. Our implementation also allows the optimization of prediction queries in SQL Server. As we will show, Raven is capable of improving prediction query performance on Apache Spark and SQL Server by up to 13.1x and 330x, respectively. For complex models, where GPU acceleration is beneficial, Raven provides up to 8x speedup compared to state-of-the-art systems. As part of the presentation, we will also give a demo showcasing Raven in action.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Raven: End-to-end Optimization of ML Prediction Queries (20)

Anzeige

Weitere von Databricks (20)

Aktuellste (20)

Anzeige

Raven: End-to-end Optimization of ML Prediction Queries

  1. 1. Raven: End-to-end Optimization of ML Prediction Queries Konstantinos Karanasos, Kwanghyun Park Gray Systems Lab, Microsoft
  2. 2. App logic offline online Model Inference Featurization Model Model optimization policies orchestratio n Data Catalogs Governance Model Tracking & Provenance Access Control Logs & Telemetry policies Decisions Live Data deployment other data featurizatio n Model Training Model Development / Training offline feat. Model Enterprise-grade ML lifecycle
  3. 3. Data Scientist Analyst/Developer model training model scoring data exploration/ preparation data selection/ transformation model deployment Use Case: Length-of-stay in Hospital Model: “Predict length of stay of a patient in the hospital” Prediction query: “Find pregnant patients that are expected to stay in the hospital more than a week”
  4. 4. Featurization Model Container REST Prediction Queries: Baseline Approach policies HTTP WebServer App logic ODBC DBMS Enterprise Features • Security: data and models outside of the DB • Extra infrastructure • High TCO • Lack tooling/best-practices Performance • Data movement • Latency • Throughput on batch-scoring
  5. 5. Prediction Queries: In-Engine Evaluation policies HTTP WebServer App logic ODBC DBMS Enterprise Features • Security: Data and models within the DBMS • Reuse Existing infrastructure • Language/tools/best practices • Low TCO Performance ? • Up to 13x faster on Spark • Up to 330x faster on SQL Server
  6. 6. Raven: An Optimizer for Prediction Queries in Azure Data + data models Unified IR INSERT INTO model (name, model) AS (“duration_of_stay”, “from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.tree import DecisionTreeClassifier from … model_pipeline = Pipeline([(‘union’, FeatureUnion(… (‘scaler’,StandardScaler()), …)) (‘clf’,DecisionTreeClassifier())])”); M: model pipeline (Data Scientist) Q: SQL query invoking model (Data Analyst) DECLARE @model varbinary(max) = ( SELECT model FROM scoring_models WHERE model_name = ”duration_of_stay“ ); WITH data AS( SELECT * FROM patient_info AS pi JOIN blood_tests AS be ON pi.id = be.id JOIN prenatal_tests AS pt ON be.id = pt.id ); SELECT d.id, p.length_of_stay FROM PREDICT(MODEL=@model, DATA=data AS d) WITH(length_of_stay Pred float) AS p WHERE d.pregnant = 1 AND p.length_of_stay > 7; patient_info blood_tests Categorical Encoding FeatureExtractor DecisionTreeClassifier Rescaling Concat prenatal_tests σ pregnant = 1 age pregnant gender 1 0 F M X <35 >=35 … bp … … … … … Unified IR for MQ patient_info blood_tests NeuralNet prenatal_tests Optimized plan for MQ switch: case (bp>140): 7 case (120<bp<140): 4 case (bp<120): 2 σage >35 σ pregnant = 1 π π π σage <=35 U σlength_of_stay >= 7 Static Analysis Cross Optimization 2 4 7 … … … … σlength_of_stay >= 7 σbp>140 SQL-inlined model MQ: inference query Runtime Code gen + Optimized IR INSERT INTO model (name, model) AS (“duration_of_stay”, “from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.tree import DecisionTreeClassifier from … model_pipeline = Pipeline([(‘union’, FeatureUnion(… (‘scaler’,StandardScaler()), …)) (‘clf’,DecisionTreeClassifier())])”); M: model pipeline (Data Scientist) Q: SQL query invoking model (Data Analyst) DECLARE @model varbinary(max) = ( SELECT model FROM scoring_models WHERE model_name = ”duration_of_stay“ ); WITH data AS( SELECT * FROM patient_info AS pi JOIN blood_tests AS be ON pi.id = be.id JOIN prenatal_tests AS pt ON be.id = pt.id ); SELECT d.id, p.length_of_stay FROM PREDICT(MODEL=@model, DATA=data AS d) WITH(length_of_stay Pred float) AS p WHERE d.pregnant = 1 AND p.length_of_stay > 7; patient_info blood_tests Categorical Encoding FeatureExtractor DecisionTreeClassifier Rescaling Concat prenatal_tests σ pregnant = 1 age pregnant gender 1 0 F M X <35 >=35 … bp … … … … … Unified IR for MQ patient_info blood_tests NeuralNet prenatal_tests Optimized plan for MQ switch: case (bp>140): 7 case (120<bp<140): 4 case (bp<120): 2 σage >35 σ pregnant = 1 π π π σage <=35 U σlength_of_stay >= 7 Static Analysis Cross Optimization 2 4 7 … … … … σlength_of_stay >= 7 σbp>140 SQL-inlined model MQ: inference query Runtime Code gen + Embed high-performance ML inference runtimes within our data engines Express data and ML operations in a common graph
  7. 7. Constructing the IR Raven IR operators Relational algebra Linear algebra Other ML operators and data featurizers UDFs Static analysis of the prediction query Support for SQL+ML Adding support for Python INSERT INTO model (name, model) AS (“duration_of_stay”, “from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.tree import DecisionTreeClassifier from … model_pipeline = Pipeline([(‘union’, FeatureUnion(… (‘scaler’,StandardScaler()), …)) (‘clf’,DecisionTreeClassifier())])”); M: model pipeline (Data Scientist) Q: SQL query invoking model (Data Analyst) DECLARE @model varbinary(max) = ( SELECT model FROM scoring_models WHERE model_name = ”duration_of_stay“ ); WITH data AS( SELECT * FROM patient_info AS pi JOIN blood_tests AS be ON pi.id = be.id JOIN prenatal_tests AS pt ON be.id = pt.id ); SELECT d.id, p.length_of_stay FROM PREDICT(MODEL=@model, DATA=data AS d) WITH(length_of_stay Pred float) AS p WHERE d.pregnant = 1 AND p.length_of_stay > 7; patient_info blood_tests Categorical Encoding FeatureExtractor DecisionTreeClassifier Rescaling Concat prenatal_tests σ pregnant = 1 age pregnant gender 1 0 F M X <35 >=35 … bp … … … … … Unified IR for MQ Static Analysis Cross Optimization 2 4 7 … … … … σlength_of_stay >= 7 MQ: inference query
  8. 8. ML Inference in Azure Data Engines SQL Server PREDICT statement in SQL Server Embedded ONNX Runtime in the engine Available in Azure SQL Edge and SQL DW (part of Azure Synapse Analytics) Spark Introduced a new PREDICT operator Similar syntax to SQL Server Support for different types of models INSERT INTO model (name, model) AS (“duration_of_stay”, “from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.tree import DecisionTreeClassifier from … model_pipeline = Pipeline([(‘union’, FeatureUnion(… (‘scaler’,StandardScaler()), …)) (‘clf’,DecisionTreeClassifier())])”); M: model pipeline (Data Scientist) Q: SQL query invoking model (Data Analyst) DECLARE @model varbinary(max) = ( SELECT model FROM scoring_models WHERE model_name = ”duration_of_stay“ ); WITH data AS( SELECT * FROM patient_info AS pi JOIN blood_tests AS be ON pi.id = be.id JOIN prenatal_tests AS pt ON be.id = pt.id ); SELECT d.id, p.length_of_stay FROM PREDICT(MODEL=@model, DATA=data AS d) WITH(length_of_stay Pred float) AS p WHERE d.pregnant = 1 AND p.length_of_stay > 7; pa Un Static Analysis MQ: inference query
  9. 9. INSERT INTO model (name, model) AS (“duration_of_stay”, “from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.tree import DecisionTreeClassifier from … model_pipeline = Pipeline([(‘union’, FeatureUnion(… (‘scaler’,StandardScaler()), …)) (‘clf’,DecisionTreeClassifier())])”); M: model pipeline (Data Scientist) Q: SQL query invoking model (Data Analyst) DECLARE @model varbinary(max) = ( SELECT model FROM scoring_models WHERE model_name = ”duration_of_stay“ ); WITH data AS( SELECT * FROM patient_info AS pi JOIN blood_tests AS be ON pi.id = be.id JOIN prenatal_tests AS pt ON be.id = pt.id ); SELECT d.id, p.length_of_stay FROM PREDICT(MODEL=@model, DATA=data AS d) WITH(length_of_stay Pred float) AS p WHERE d.pregnant = 1 AND p.length_of_stay > 7; patient_info blood_tests Categorical Encoding FeatureExtractor DecisionTreeClassifier Rescaling Concat prenatal_tests σ pregnant = 1 age pregnant gender 1 0 F M X <35 >=35 … bp … … … … … Unified IR for MQ patient_info blood_tests NeuralNet prenatal_tests Optimized plan for MQ switch: case (bp>140): 7 case (120<bp<140): 4 case (bp<120): 2 σage >35 σ pregnant = 1 π π π σage <=35 U σlength_of_stay >= 7 Static Analysis Cross Optimization 2 4 7 … … … … σlength_of_stay >= 7 σbp>140 SQL-inlined model MQ: inference query Runtime Code gen + INSERT INTO model (name, model) AS (“duration_of_stay”, “from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.tree import DecisionTreeClassifier from … model_pipeline = Pipeline([(‘union’, FeatureUnion(… (‘scaler’,StandardScaler()), …)) (‘clf’,DecisionTreeClassifier())])”); M: model pipeline (Data Scientist) Q: SQL query invoking model (Data Analyst) DECLARE @model varbinary(max) = ( SELECT model FROM scoring_models WHERE model_name = ”duration_of_stay“ ); WITH data AS( SELECT * FROM patient_info AS pi JOIN blood_tests AS be ON pi.id = be.id JOIN prenatal_tests AS pt ON be.id = pt.id ); SELECT d.id, p.length_of_stay FROM PREDICT(MODEL=@model, DATA=data AS d) WITH(length_of_stay Pred float) AS p WHERE d.pregnant = 1 AND p.length_of_stay > 7; patient_info blood_tests Categorical Encoding FeatureExtractor DecisionTreeClassifier Rescaling Concat prenatal_tests σ pregnant = 1 age pregnant gender 1 0 F M X <35 >=35 … bp … … … … … Unified IR for MQ patient_info blood_tests NeuralNet prenatal_tests Optimized plan for MQ switch: case (bp>140): 7 case (120<bp<140): 4 case (bp<120): 2 σage >35 σ pregnant = 1 π π π σage <=35 U σlength_of_stay >= 7 Static Analysis Cross Optimization 2 4 7 … … … … σlength_of_stay >= 7 σbp>140 SQL-inlined model MQ: inference query Runtime Code gen + Q: “Find pregnant patients expected to stay in the hospital more than a week” INSERT INTO model (name, model) AS (“duration_of_stay”, “from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.tree import DecisionTreeClassifier from … model_pipeline = Pipeline([(‘union’, FeatureUnion(… (‘scaler’,StandardScaler()), …)) (‘clf’,DecisionTreeClassifier())])”); M: model pipeline (Data Scientist) Q: SQL query invoking model (Data Analyst) DECLARE @model varbinary(max) = ( SELECT model FROM scoring_models WHERE model_name = ”duration_of_stay“ ); WITH data AS( SELECT * FROM patient_info AS pi JOIN blood_tests AS be ON pi.id = be.id JOIN prenatal_tests AS pt ON be.id = pt.id ); SELECT d.id, p.length_of_stay FROM PREDICT(MODEL=@model, DATA=data AS d) WITH(length_of_stay Pred float) AS p WHERE d.pregnant = 1 AND p.length_of_stay > 7; patient_info blood_tests Categorical Encoding FeatureExtractor DecisionTreeClassifier Rescaling Concat prenatal_tests σ pregnant = 1 age pregnant gender 1 0 F M X <35 >=35 … bp … … … … … Unified IR for MQ patient_info blood_tests NeuralNet prenatal_tests Optimized plan for MQ switch: case (bp>140): 7 case (120<bp<140): 4 case (bp<120): 2 σage >35 σ pregnant = 1 π π π σage <=35 U σlength_of_stay >= 7 Static Analysis Cross Optimization 2 4 7 … … … … σlength_of_stay >= 7 σbp>140 SQL-inlined model MQ: inference query Runtime Code gen + Raven: An Optimizer for Prediction Queries + + Runtime Code Gen
  10. 10. Raven optimizations in practice (name, model) AS ”, eline import Pipeline rocessing import StandardScaler import DecisionTreeClassifier n’, FeatureUnion(… scaler’,StandardScaler()), …)) reeClassifier())])”); Data Scientist) ng model (Data Analyst) rbinary(max) = ( OM scoring_models e = ”duration_of_stay“ ); nfo AS pi ts AS be ON pi.id = be.id tests AS pt ON be.id = pt.id ngth_of_stay L=@model, DATA=data AS d) ay Pred float) AS p = 1 AND p.length_of_stay > 7; patient_info blood_tests Categorical Encoding FeatureExtractor DecisionTreeClassifier Rescaling Concat prenatal_tests σ pregnant = 1 age pregnant gender 1 0 F M X <35 >=35 … bp … … … … … Unified IR for MQ patient_info blood_tests NeuralNet prenatal_tests Optimized plan for MQ switch: case (bp>140): 7 case (120<bp<140): 4 case (bp<120): 2 σage >35 σ pregnant = 1 π π π σage <=35 U σlength_of_stay >= 7 Static Analysis Cross Optimization 2 4 7 … … … … σlength_of_stay >= 7 σbp>140 SQL-inlined model Runti Code 1. Predicate-based model pruning 2. Model projection pushdown 3. Model splitting 4. Model-to-SQL translation 5. NN translation 6. Standard DB optimizations 7. Compiler optimizations
  11. 11. 1. Avoid unnecessary computation Information passing between model and data 2. Pick the right runtime for each operation Translation between data and ML operations 3. Hardware acceleration Translation to tensor computations (Hummingbird) INSERT INTO model (name, model) AS (“duration_of_stay”, “from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.tree import DecisionTreeClassifier from … model_pipeline = Pipeline([(‘union’, FeatureUnion(… (‘scaler’,StandardScaler()), …)) (‘clf’,DecisionTreeClassifier())])”); M: model pipeline (Data Scientist) Q: SQL query invoking model (Data Analyst) DECLARE @model varbinary(max) = ( SELECT model FROM scoring_models WHERE model_name = ”duration_of_stay“ ); WITH data AS( SELECT * FROM patient_info AS pi JOIN blood_tests AS be ON pi.id = be.id JOIN prenatal_tests AS pt ON be.id = pt.id ); SELECT d.id, p.length_of_stay FROM PREDICT(MODEL=@model, DATA=data AS d) WITH(length_of_stay Pred float) AS p WHERE d.pregnant = 1 AND p.length_of_stay > 7; patient_info blood_tests Categorical Encoding FeatureExtractor DecisionTreeClassifier Rescaling Concat prenatal_tests σ pregnant = 1 age pregnant gender 1 0 F M X <35 >=35 … bp … … … … … Unified IR for MQ patient_info blood_tests NeuralNet prenatal_tests Optimized plan for MQ switch: case (bp>140): 7 case (120<bp<140): 4 case (bp<120): 2 σage >35 σ pregnant = 1 π π π σage <=35 U σlength_of_stay >= 7 Static Analysis Cross Optimization 2 4 7 … … … … σlength_of_stay >= 7 σbp>140 SQL-inlined model MQ: inference query Runtime Code gen Raven optimizations: Key Ideas
  12. 12. 0 1,000 2,000 3,000 4,000 5,000 DT-depth5 DT-depth8 LR-.001 GB-20est DT-depth5 DT-depth8 LR-.001 GB-20est Hospital - 2 billion rows Expedia - 500 million rows Elapsed time (seconds) End-to-end inferene query time SparkML Sklearn ONNX runtime Raven Performance Evaluation: Raven in Spark (HDI) Best of Raven: • Decision Trees (DT) and Logistic Regressions (LR): Model Projection Pushdown + ML-to-SQL • Gradient Boost (GB): Model Projection Pushdown SELECT PREDICT(model, col1, …) FROM Hospital SELECT PREDICT(model, S.col1, …) FROM listings S, hotels R1, searches R2 WHERE S.prop_id = R1.prop_id AND S.srch_id = R2.srch_id Raven outperforms other ML runtimes (SparkML, Sklearn, ONNX runtime) by up to ~44x ~44x
  13. 13. 0 500 60Est/Dep5 100Est/Dep4 100Est/Dep8 500Est/Dep8 Elapsed time (seconds) Gradient Boost Models (Hospital 200M rows) ONNX runtime Raven - CPU Raven - GPU 2500 3000 3500 End-to-end inference query time Performance Evaluation: Raven in Spark with GPU SELECT PREDICT(model, col1, …) FROM Hospital Raven + GPU outperforms ONNX runtime by up to ~8x for complex models ~8x
  14. 14. 1 10 100 1,000 10,000 100,000 DT-depth5 DT- depth8 LR-.001 GB/RF-20est DT-depth5 DT- depth8 LR-.001 GB-20est hospital - 100M rows expedia - 100M rows End-to-end Time (sec) Log Scale End-to-end inference query time MADlib SQL Server (DOP1) Raven (DOP1) SQL Server (DOP16) Raven (DOP16) Performance Evaluation: Raven Plans in SQL Server Potential gains with Raven in SQL Server are significantly large! ~230x ~100x Best of Raven: • Decision Trees (DT) and Logistic Regressions (LR): Model Projection Pushdown + ML-to-SQL • Gradient Boost (GB): Model Projection Pushdown
  15. 15. Performance Evaluation: Raven in SQL Server with GPU Potential gains with Raven and GPU acceleration are significantly large! ~100x Batch size: • CPU: Minimum query time obtained with optimal choice of batch size (50K/100K rows). • GPU: 600K rows. 0 200 400 600 800 1000 1200 1400 depth3- 20est depth5- 60est depth4- 100est depth8- 100est depth8- 500est End-to-end Time (secs) Min. CPU-SKL GPU-HB ~2.6x hospital – 100M rows, GB models
  16. 16. Demo
  17. 17. Conclusion: in-DBMS model inference • Raven is the first step in a long journey of incorporating ML inference as a foundational extension of relational algebra and an integral part of SQL query optimizers and runtimes • Novel Raven optimizer with cross optimizations and operator transformations Ø Up to 13x performance improvements on Spark Ø Up to 330x performance improvements on SQL Server • Integration of Raven within Spark and SQL Server
  18. 18. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.
  19. 19. Backup
  20. 20. Current state of affairs: In-application model inference Use case: hospital length-of-stay “Find pregnant patients that are expected to stay in the hospital more than a week” Security • Data leaves the DB • Model outside of the DB Performance • Data movement • Use of Python for data operations DBMS
  21. 21. Raven: In-DBMS model inference INSERT INTO model (name, model) AS (“duration_of_stay”, “from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.tree import DecisionTreeClassifier from … model_pipeline = Pipeline([(‘union’, FeatureUnion(… (‘scaler’,StandardScaler()), …)) (‘clf’,DecisionTreeClassifier())])”); M: model pipeline (Data Scientist) Q: SQL query invoking model (Data Analyst) DECLARE @model varbinary(max) = ( SELECT model FROM scoring_models WHERE model_name = ”duration_of_stay“ ); WITH data AS( SELECT * FROM patient_info AS pi JOIN blood_tests AS be ON pi.id = be.id JOIN prenatal_tests AS pt ON be.id = pt.id ); SELECT d.id, p.length_of_stay FROM PREDICT(MODEL=@model, DATA=data AS d) WITH(length_of_stay Pred float) AS p WHERE d.pregnant = 1 AND p.length_of_stay > 7; Static Analysis MQ: inference query Inference query: SQL + PREDICT (SQL Server 2017 syntax) to combine SQL operations with ML inference DBMS DBMS Raven model data SQL + ML
  22. 22. Raven: In-DB model inference DBMS Raven Security • Data and models within the DB • Treat models as data User experience • Leverage maturity of RDBMS • Connectivity, tool integration Can in-DBMS ML inference match (or exceed?) the performance of state-of-the-art ML frameworks? Yes, by up to 230x!
  23. 23. Cross-optimizations in practice l (name, model) AS ay”, ipeline import Pipeline eprocessing import StandardScaler ee import DecisionTreeClassifier = ion’, FeatureUnion(… (‘scaler’,StandardScaler()), …)) nTreeClassifier())])”); (Data Scientist) oking model (Data Analyst) varbinary(max) = ( FROM scoring_models ame = ”duration_of_stay“ ); _info AS pi ests AS be ON pi.id = be.id l_tests AS pt ON be.id = pt.id length_of_stay DEL=@model, DATA=data AS d) stay Pred float) AS p t = 1 AND p.length_of_stay > 7; patient_info blood_tests Categorical Encoding FeatureExtractor DecisionTreeClassifier Rescaling Concat prenatal_tests σ pregnant = 1 age pregnant gender 1 0 F M X <35 >=35 … bp … … … … … Unified IR for MQ patient_info blood_tests NeuralNet prenatal_tests Optimized plan for MQ switch: case (bp>140): 7 case (120<bp<140): 4 case (bp<120): 2 σage >35 σ pregnant = 1 π π π σage <=35 U σlength_of_stay >= 7 Static Analysis Cross Optimization 2 4 7 … … … … σlength_of_stay >= 7 σbp>140 SQL-inlined model y Run Cod Cross-IR optimizations and operator transformations: Ø Predicate-based model pruning Ø Model projection pushdown Ø Model splitting Ø Model inlining Ø NN translation Ø Standard DB optimizations Ø Compiler optimizations
  24. 24. Raven overview INSERT INTO model (name, model) AS (“duration_of_stay”, “from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.tree import DecisionTreeClassifier from … model_pipeline = Pipeline([(‘union’, FeatureUnion(… (‘scaler’,StandardScaler()), …)) (‘clf’,DecisionTreeClassifier())])”); M: model pipeline (Data Scientist) Q: SQL query invoking model (Data Analyst) DECLARE @model varbinary(max) = ( SELECT model FROM scoring_models WHERE model_name = ”duration_of_stay“ ); WITH data AS( SELECT * FROM patient_info AS pi JOIN blood_tests AS be ON pi.id = be.id JOIN prenatal_tests AS pt ON be.id = pt.id ); SELECT d.id, p.length_of_stay FROM PREDICT(MODEL=@model, DATA=data AS d) WITH(length_of_stay Pred float) AS p WHERE d.pregnant = 1 AND p.length_of_stay > 7; patient_info blood_tests Categorical Encoding FeatureExtractor DecisionTreeClassifier Rescaling Concat prenatal_tests σ pregnant = 1 age pregnant gender 1 0 F M X <35 >=35 … bp … … … … … Unified IR for MQ patient_info blood_tests NeuralNet prenatal_tests Optimized plan for MQ switch: case (bp>140): 7 case (120<bp<140): 4 case (bp<120): 2 σage >35 σ pregnant = 1 π π π σage <=35 U σlength_of_stay >= 7 Static Analysis Cross Optimization 2 4 7 … … … … σlength_of_stay >= 7 σbp>140 SQL-inlined model MQ: inference query Runtime Code gen + Key ideas: 1. Novel cross-optimizations between SQL and ML operations 2. Combine high-performance ML inference engines with SQL Server
  25. 25. Effect of cross optimizations 1 10 102 103 104 1K 10K 100K 1M Inference Time (ms) Log Scale Dataset Size RF (scikit-learn) RF-NN (CPU) RF-NN (GPU) 24.5x 15x 5.3x
  26. 26. Execution modes In-process Deep integration of ONNX Runtime in SQL Server Out-of-process For queries/models not supported by our static analyzer sp_execute_external_script (Python, R, Java) Containerized For languages not supported by out- of-process execution
  27. 27. In-process execution Native predict: execute the model in the same process as SQL Server Rudimentary support since SQL Server 2017 (five hardcoded models) Take advantage of state-of-the-art ML inference engines Compiler optimizations, Code generation, Hardware acceleration SQL Server + ONNX Runtime Some challenges Align schemata between DB and model Transform data to/from tensors (avoid copying) Cache inference sessions Allow for different ML engines INSERT INTO model (name, model) AS (“duration_of_stay”, “from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.tree import DecisionTreeClassifier from … model_pipeline = Pipeline([(‘union’, FeatureUnion(… (‘scaler’,StandardScaler()), …)) (‘clf’,DecisionTreeClassifier())])”); M: model pipeline (Data Scientist) Q: SQL query invoking model (Data Analyst) DECLARE @model varbinary(max) = ( SELECT model FROM scoring_models WHERE model_name = ”duration_of_stay“ ); WITH data AS( SELECT * FROM patient_info AS pi JOIN blood_tests AS be ON pi.id = be.id JOIN prenatal_tests AS pt ON be.id = pt.id ); SELECT d.id, p.length_of_stay FROM PREDICT(MODEL=@model, DATA=data AS d) WITH(length_of_stay Pred float) AS p WHERE d.pregnant = 1 AND p.length_of_stay > 7; St Ana MQ: inference query
  28. 28. Current status In-process predictions Ø Implementation in SQL Server 2019 Ø Public preview in Azure SQL DB Edge Ø Private preview in Azure SQL DW Out-of-process predictions Ø ONNX Runtime as an external language (ongoing)
  29. 29. Benefits of deep integration 1 10 100 1K 10k 1K 10K 100K 1M 10M 1K 10K 100K 1M 10M Total Inference Time (ms) Log Scale Dataset Size Random Forest MLP ORT Raven Raven ext.

×