SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
Anne Holler, Michael Mui
Uber
Using Spark MLlib Models in a
Production Training and Serving
Platform: Experiences and Extensions
#UnifiedAnalytics #SparkAISummit
Introduction
● Michelangelo is Uber’s Machine Learning Platform
○ Supports training, evaluation, and serving of ML models in production
○ Uses Spark MLlib for training and serving at scale
● Michelangelo's use of Spark MLlib has evolved over time
○ Initially used as part of a monolithic training and serving platform, with
hardcoded model pipeline stages saved/loaded from protobuf
○ Initially customized to support online serving, with
online serving APIs added to Transformers ad hoc
3#UnifiedAnalytics #SparkAISummit
What are Spark Pipelines: Estimators and Transformers
4#UnifiedAnalytics #SparkAISummit
Estimator: Spark
abstraction of a learning
algorithm or any algorithm
that fits or trains on data
Transformer: Spark
abstraction of an ML
model stage that includes
feature transforms and
predictors
5#UnifiedAnalytics #SparkAISummit
Pipeline Models Encode Operational Steps
Pipeline Models Enforce Consistency
● Both Training and Serving involve pre- and post- transform stages in addition to raw
fitting and inferencing from ML model that need to be consistent:
○ Data Transformations
○ Feature Extraction and Pre-Processing
○ ML Model Raw Predictions
○ Post-Prediction Transformations
6#UnifiedAnalytics #SparkAISummit
7#UnifiedAnalytics #SparkAISummit
ML Workflow In Practice
Pipeline Models Encapsulate Complexity
8#UnifiedAnalytics #SparkAISummit
Complexity arises from Different Workflow Needs
9#UnifiedAnalytics #SparkAISummit
Research Scientists / Data Scientists / Research/ML Engineers
Data Analysts / Data
Engineers / Software
Engineers
ML Engineers / Production Engineers
Complexity arises from Different User Needs
Evolution Goal: Retain Performance and Consistency
● Requirement 1: Performant distributed batch serving that comes with the
DataFrame-based execution model on top of Spark’s SQL Engine
● Requirement 2: Low-latency (P99 latency <10ms), high throughput
solution for real-time serving
● Requirement 3: Support consistency in batch and real-time prediction
accuracy by running through common code paths whenever practical
10#UnifiedAnalytics #SparkAISummit
Evolution Goal: Increase Flexibility and Velocity
● Requirement 1: Flexibility in model definitions: libraries, frameworks
○ Allow users to define model pipelines (custom Estimator/Transformer)
○ Train and serve those models efficiently
● Requirement 2: Flexibility in Michelangelo use
○ Decouple its monolithic structure into components
○ Allow interoperability with non-Michelangelo components / pipelines
● Requirement 3: Faster / Easier Spark upgrade path
○ Replace custom protobuf model representation
○ Formalize online serving APIs
11#UnifiedAnalytics #SparkAISummit
Evolve: Replacing Protobuf Model Representation
● Considered MLeap, PMML, PFA, Spark PipelineModel: all supported in Spark MLlib
○ MLeap: non-standard, impacting interoperability w/ Spark compliant ser/de
○ MLeap, PMML, PFA: Lag in supporting new Spark Transformers
○ MLeap, PMML, PFA: Risk of inconsistent model training/serving behavior
● Wanted to choose Spark PipelineModel representation for Michelangelo models
○ Avoids above shortcomings
○ Provides simple interface for adding estimators/transformers
○ But has challenges in Online Serving (see Pentreath’s Spark Summit 2018 talk)
■ Spark MLlib PipelineModel load latency too large
■ Spark MLlib serving APIs too slow for online serving
12#UnifiedAnalytics #SparkAISummit
Spark PipelineModel Representation
● Spark PipelineModel format example file structure
├── 0_strIdx_9ec54829bd7c
│ ├── data part-00000-a9f31485-4200-4845-8977-8aec7fa03157.snappy.parquet
│ ├── metadata part-00000
├── 1_strIdx_5547304a5d3d
│ ├── data part-00000-163942b9-a194-4023-b477-a5bfba236eb0.snappy.parquet
│ ├── metadata part-00000
├── 2_vecAssembler_29b5569f2d98
│ ├── metadata part-00000
├── 3_glm_0b885f8f0843
│ ├── data part-00000-0ead8860-f596-475f-96f3-5b10515f075e.snappy.parquet
│ └── metadata part-00000
└── 4_idxToStr_968f207b70f2
├── metadata part-00000
● Format Read/Written by Spark MLReadable/MLWritable
13#UnifiedAnalytics #SparkAISummit
trait MLReadable[T] {
def read : org.apache.spark.ml.util.MLReader[T]
def load(path : scala.Predef.String) : T
}
trait MLWritable {
def write: org.apache.spark.ml.util.MLWriter
def save(path : scala.Predef.String)
}
Challenge: Spark PipelineModel Load Latency
● Zipped Spark Pipeline and
protobuf files were comparable
sizes (up to 10s of MBs)
● Spark Pipeline load latency was
very high relative to custom
protobuf load latency
● Impacts online serving resource
agility and health monitoring
14#UnifiedAnalytics #SparkAISummit
Pipeline Model Type Spark Pipeline /
Protobuf Load
GBDT Regression 21.22x
GBDT Binary Classification 28.63x
Linear Regression 29.94x
Logistic Regression 43.97x
RF Binary Classification 8.05x
RF Regression 12.16x
Tuning Load Latency: Part 1
Replaced sc.textfile with local metadata read
● DefaultParamsReadable.load uses sc.textfile
● Forming RDD of strings for small 1-line file was slower than simple load
● Replaced with java I/O for local file case, which was much faster
○ Updated loadMetadata method in
mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala
● Big reduction in latency of metadata read
15#UnifiedAnalytics #SparkAISummit
Tuning Load Latency: Part 2
Replaced sparkSession.read.parquet with ParquetUtil.read
● Spark distributed read/select for small Transformer data was very slow
● Replaced with direct parquet read/getRecord, which was much faster
○ Relevant to Transformers like LogisticRegression,
StringIndexer, LinearRegression
● Significant reduction in latency of Transformer data read
16#UnifiedAnalytics #SparkAISummit
Tuning Load Latency: Part 3
Updated Tree Ensemble model data save and load to use Parquet directly
● Coalesced tree ensemble node and metadata weights DataFrames at save
time to avoid writing large number of small files that are slow to read
● Loading Tree Ensemble models invoked a groupByKey,sortByKey
○ Spark distributed read/select/sort/collect was very slow
● Replaced with direct parquet read/getRecord, which was much faster
● Significant reduction in latency of tree ensemble data read
17#UnifiedAnalytics #SparkAISummit
Before and After: Tuned Pipeline Load Latency
Greatly improved MLLib load
latency, while retaining
current on-disk format!
18#UnifiedAnalytics #SparkAISummit
Pipeline Model Type Spark Pipeline /
Protobuf Load
Tuned Spark
Pipeline / Protobuf
Load
GBDT Regression 21.22x 2.05x
GBDT Binary
Classification
28.63x 2.50x
Linear Regression 29.94x 2.03x
Logistic Regression 43.97x 2.88x
RF Binary Classification 8.05x 3.14x
RF Regression 12.16x 3.01x
Challenge: SparkContext Cleaner Performance
● Michelangelo online serving creates local SparkContext to handle load of
any unoptimized Transformers
● Periodic context cleaner runs induced non-trivial latency in serving request
responses
● Solution: Stopped SparkContext when models not actively being loaded.
○ Model load only happens at service startup or when new models are
deployed into production online serving
19#UnifiedAnalytics #SparkAISummit
Challenge: Serving APIs too slow for online serving
● Added OnlineTransformer trait to Transformers to be served online
○ Single & small list APIs which leverage low-level spark predict methods
○ Injected at Transformer load time, so pipeline models trained outside of
Michelangelo can be served online by Michelangelo
trait OnlineTransformer {
def scoreInstances(instances: List[Map[String, Any]]): List[Map[String, Any]]
def scoreInstance(instance: Map[String, Any]): Map[String, Any]
}
#UnifiedAnalytics #SparkAISummit 20
Michelangelo Use of Spark MLlib Evolution Outcome
● Michelangelo is using updated Spark MLlib interface in production
○ Spark PipelineModel on-disk representation
○ Optimized Transformer loads to support online serving
○ OnlineTransformer trait to provide online serving APIs
#UnifiedAnalytics #SparkAISummit 21
Example Use Cases Enabled by Evolved MA MLlib
● Flexible Pipeline Model Definition
○ Model Pipeline including TFTransformer
● Flexible Use of Michelangelo
○ Train Model in Notebook, Serve Model in Michelangelo
22#UnifiedAnalytics #SparkAISummit
Flexible Pipeline Model Definition
● Interoperability with non-Michelangelo components / pipelines
○ Cross framework, system, language support via Estimators /
Transformers
● Allow customizability of PipelineModel, Estimators, Transformers while
fully integrated into Michelangelo’s Training and Serving infrastructure
○ Combines Spark’s Data Processing with Training using custom
libraries e.g. XGBoost, Tensorflow
23#UnifiedAnalytics #SparkAISummit
Flexible Pipeline Definition Example: TFTransformer
● Serving TensorFlow Models with TFTransformer
https://eng.uber.com/cota-v2/
○ Spark Pipeline built from training contains both data processing
transformers and TensorFlow transformations (TFTransformer)
○ P95 serving latency < 10ms
○ Combines the distributed computation of Spark and low-latency serving
using CPUs and the acceleration of DL training using GPUs
24#UnifiedAnalytics #SparkAISummit
25#UnifiedAnalytics #SparkAISummit
Serving TF Models
using TFTransformer
Flexible Use Example: Train in DSW, Serve in MA
● Decouple Michelangelo into
functional components
● Consolidate custom data
processing, feature engineering,
model definition, train, and
serve around notebook
environments (DSW)
26#UnifiedAnalytics #SparkAISummit
27#UnifiedAnalytics #SparkAISummit
Experiment in DSW, Serve in Michelangelo
Key Learnings in Evolving Michelangelo
● Pipeline representation of models is powerful
○ Encodes all steps in operational modeling
○ Enforces consistency between training and serving
● Pipeline representation of models needs to be flexible
○ Model pipeline can encapsulate complex stages
○ Complexity stems from differing workflow and user needs
28#UnifiedAnalytics #SparkAISummit
Conclusion
● Michelangelo updated use of Spark MLlib is working well in production
● Propose to open source our changes to Spark MLlib
○ Submitted Spark MLlib Online Serving SPIP
■ https://issues.apache.org/jira/browse/SPARK-26247
○ Posted 2 patches
■ Patch to reduce spark pipeline load latency
■ Patch to add OnlineTransformer trait for online serving APIs
29#UnifiedAnalytics #SparkAISummit
30#UnifiedAnalytics #SparkAISummit
Questions?
31#UnifiedAnalytics #SparkAISummit
WIFI SSID:SparkAISummit | Password: UnifiedAnalytics

Weitere ähnliche Inhalte

Was ist angesagt?

RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기Woong won Lee
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究についてDeep Learning JP
 
Machine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 SydneyMachine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 SydneyAlexandros Karatzoglou
 
ファクター投資と機械学習
ファクター投資と機械学習ファクター投資と機械学習
ファクター投資と機械学習Kei Nakagawa
 
Amosを使ったベイズ推定
Amosを使ったベイズ推定Amosを使ったベイズ推定
Amosを使ったベイズ推定考司 小杉
 
Stock market trend prediction using k nearest neighbor(knn) algorithm
Stock market trend prediction using k nearest neighbor(knn) algorithmStock market trend prediction using k nearest neighbor(knn) algorithm
Stock market trend prediction using k nearest neighbor(knn) algorithmVenkat Projects
 
深層学習時代の自然言語処理
深層学習時代の自然言語処理深層学習時代の自然言語処理
深層学習時代の自然言語処理Yuya Unno
 
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsHighly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsBill Liu
 
2009/12/10 GPUコンピューティングの現状とスーパーコンピューティングの未来
2009/12/10 GPUコンピューティングの現状とスーパーコンピューティングの未来2009/12/10 GPUコンピューティングの現状とスーパーコンピューティングの未来
2009/12/10 GPUコンピューティングの現状とスーパーコンピューティングの未来Preferred Networks
 
NIPS2017読み会@PFN: Hierarchical Reinforcement Learning + α
NIPS2017読み会@PFN: Hierarchical Reinforcement Learning + αNIPS2017読み会@PFN: Hierarchical Reinforcement Learning + α
NIPS2017読み会@PFN: Hierarchical Reinforcement Learning + α佑 甲野
 
2018年01月27日 TensorBoardによる学習の可視化
2018年01月27日 TensorBoardによる学習の可視化2018年01月27日 TensorBoardによる学習の可視化
2018年01月27日 TensorBoardによる学習の可視化aitc_jp
 
5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron) 5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron) 艾鍗科技
 
Poisson Distribution, Poisson Process & Geometric Distribution
Poisson Distribution, Poisson Process & Geometric DistributionPoisson Distribution, Poisson Process & Geometric Distribution
Poisson Distribution, Poisson Process & Geometric DistributionDataminingTools Inc
 
強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷Eiji Sekiya
 
最適化計算の概要まとめ
最適化計算の概要まとめ最適化計算の概要まとめ
最適化計算の概要まとめYuichiro MInato
 

Was ist angesagt? (20)

RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기
 
TDA for feature selection
TDA for feature selectionTDA for feature selection
TDA for feature selection
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
 
Machine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 SydneyMachine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 Sydney
 
ファクター投資と機械学習
ファクター投資と機械学習ファクター投資と機械学習
ファクター投資と機械学習
 
Amosを使ったベイズ推定
Amosを使ったベイズ推定Amosを使ったベイズ推定
Amosを使ったベイズ推定
 
Transformers in 2021
Transformers in 2021Transformers in 2021
Transformers in 2021
 
Stock market trend prediction using k nearest neighbor(knn) algorithm
Stock market trend prediction using k nearest neighbor(knn) algorithmStock market trend prediction using k nearest neighbor(knn) algorithm
Stock market trend prediction using k nearest neighbor(knn) algorithm
 
深層学習時代の自然言語処理
深層学習時代の自然言語処理深層学習時代の自然言語処理
深層学習時代の自然言語処理
 
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsHighly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
 
強化学習3章
強化学習3章強化学習3章
強化学習3章
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
2009/12/10 GPUコンピューティングの現状とスーパーコンピューティングの未来
2009/12/10 GPUコンピューティングの現状とスーパーコンピューティングの未来2009/12/10 GPUコンピューティングの現状とスーパーコンピューティングの未来
2009/12/10 GPUコンピューティングの現状とスーパーコンピューティングの未来
 
NIPS2017読み会@PFN: Hierarchical Reinforcement Learning + α
NIPS2017読み会@PFN: Hierarchical Reinforcement Learning + αNIPS2017読み会@PFN: Hierarchical Reinforcement Learning + α
NIPS2017読み会@PFN: Hierarchical Reinforcement Learning + α
 
CNN Quantization
CNN QuantizationCNN Quantization
CNN Quantization
 
2018年01月27日 TensorBoardによる学習の可視化
2018年01月27日 TensorBoardによる学習の可視化2018年01月27日 TensorBoardによる学習の可視化
2018年01月27日 TensorBoardによる学習の可視化
 
5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron) 5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron)
 
Poisson Distribution, Poisson Process & Geometric Distribution
Poisson Distribution, Poisson Process & Geometric DistributionPoisson Distribution, Poisson Process & Geometric Distribution
Poisson Distribution, Poisson Process & Geometric Distribution
 
強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷
 
最適化計算の概要まとめ
最適化計算の概要まとめ最適化計算の概要まとめ
最適化計算の概要まとめ
 

Ähnlich wie Spark MLlib Models in Production: Uber's Michelangelo Platform

Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark MLdatamantra
 
Operationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsOperationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsLightbend
 
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit EU talk by Mikhail Semeniuk Hollin WilkinsSpark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit EU talk by Mikhail Semeniuk Hollin WilkinsSpark Summit
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...confluent
 
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsLightbend
 
Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...
Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...
Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...Qualcomm Developer Network
 
DAIS Europe Nov. 2020 presentation on MLflow Model Serving
DAIS Europe Nov. 2020 presentation on MLflow Model ServingDAIS Europe Nov. 2020 presentation on MLflow Model Serving
DAIS Europe Nov. 2020 presentation on MLflow Model Servingamesar0
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureFei Chen
 
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFramesApache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFramesDatabricks
 
Developing Microservices using Spring - Beginner's Guide
Developing Microservices using Spring - Beginner's GuideDeveloping Microservices using Spring - Beginner's Guide
Developing Microservices using Spring - Beginner's GuideMohanraj Thirumoorthy
 
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful ServingDatabricks
 
Scale machine learning deployment
Scale machine learning deploymentScale machine learning deployment
Scale machine learning deploymentGang Tao
 
FlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkFlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkTheodoros Vasiloudis
 
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving SystemsPRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving SystemsNECST Lab @ Politecnico di Milano
 
Benefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformBenefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformGetInData
 
Michelangelo - Machine Learning Platform - 2018
Michelangelo - Machine Learning Platform - 2018Michelangelo - Machine Learning Platform - 2018
Michelangelo - Machine Learning Platform - 2018Karthik Murugesan
 
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...Using Machine Learning & Artificial Intelligence to Create Impactful Customer...
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...Costanoa Ventures
 
MLflow Model Serving - DAIS 2021
MLflow Model Serving - DAIS 2021MLflow Model Serving - DAIS 2021
MLflow Model Serving - DAIS 2021amesar0
 

Ähnlich wie Spark MLlib Models in Production: Uber's Michelangelo Platform (20)

Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark ML
 
Operationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsOperationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML Models
 
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit EU talk by Mikhail Semeniuk Hollin WilkinsSpark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
 
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
 
KFServing and Feast
KFServing and FeastKFServing and Feast
KFServing and Feast
 
Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...
Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...
Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...
 
DAIS Europe Nov. 2020 presentation on MLflow Model Serving
DAIS Europe Nov. 2020 presentation on MLflow Model ServingDAIS Europe Nov. 2020 presentation on MLflow Model Serving
DAIS Europe Nov. 2020 presentation on MLflow Model Serving
 
SparkNet presentation
SparkNet presentationSparkNet presentation
SparkNet presentation
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
 
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFramesApache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
 
Developing Microservices using Spring - Beginner's Guide
Developing Microservices using Spring - Beginner's GuideDeveloping Microservices using Spring - Beginner's Guide
Developing Microservices using Spring - Beginner's Guide
 
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
 
Scale machine learning deployment
Scale machine learning deploymentScale machine learning deployment
Scale machine learning deployment
 
FlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkFlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache Flink
 
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving SystemsPRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
 
Benefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformBenefits of a Homemade ML Platform
Benefits of a Homemade ML Platform
 
Michelangelo - Machine Learning Platform - 2018
Michelangelo - Machine Learning Platform - 2018Michelangelo - Machine Learning Platform - 2018
Michelangelo - Machine Learning Platform - 2018
 
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...Using Machine Learning & Artificial Intelligence to Create Impactful Customer...
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...
 
MLflow Model Serving - DAIS 2021
MLflow Model Serving - DAIS 2021MLflow Model Serving - DAIS 2021
MLflow Model Serving - DAIS 2021
 

Mehr von Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Mehr von Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Kürzlich hochgeladen

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 

Kürzlich hochgeladen (20)

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 

Spark MLlib Models in Production: Uber's Michelangelo Platform

  • 1. WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
  • 2. Anne Holler, Michael Mui Uber Using Spark MLlib Models in a Production Training and Serving Platform: Experiences and Extensions #UnifiedAnalytics #SparkAISummit
  • 3. Introduction ● Michelangelo is Uber’s Machine Learning Platform ○ Supports training, evaluation, and serving of ML models in production ○ Uses Spark MLlib for training and serving at scale ● Michelangelo's use of Spark MLlib has evolved over time ○ Initially used as part of a monolithic training and serving platform, with hardcoded model pipeline stages saved/loaded from protobuf ○ Initially customized to support online serving, with online serving APIs added to Transformers ad hoc 3#UnifiedAnalytics #SparkAISummit
  • 4. What are Spark Pipelines: Estimators and Transformers 4#UnifiedAnalytics #SparkAISummit Estimator: Spark abstraction of a learning algorithm or any algorithm that fits or trains on data Transformer: Spark abstraction of an ML model stage that includes feature transforms and predictors
  • 6. Pipeline Models Enforce Consistency ● Both Training and Serving involve pre- and post- transform stages in addition to raw fitting and inferencing from ML model that need to be consistent: ○ Data Transformations ○ Feature Extraction and Pre-Processing ○ ML Model Raw Predictions ○ Post-Prediction Transformations 6#UnifiedAnalytics #SparkAISummit
  • 7. 7#UnifiedAnalytics #SparkAISummit ML Workflow In Practice Pipeline Models Encapsulate Complexity
  • 8. 8#UnifiedAnalytics #SparkAISummit Complexity arises from Different Workflow Needs
  • 9. 9#UnifiedAnalytics #SparkAISummit Research Scientists / Data Scientists / Research/ML Engineers Data Analysts / Data Engineers / Software Engineers ML Engineers / Production Engineers Complexity arises from Different User Needs
  • 10. Evolution Goal: Retain Performance and Consistency ● Requirement 1: Performant distributed batch serving that comes with the DataFrame-based execution model on top of Spark’s SQL Engine ● Requirement 2: Low-latency (P99 latency <10ms), high throughput solution for real-time serving ● Requirement 3: Support consistency in batch and real-time prediction accuracy by running through common code paths whenever practical 10#UnifiedAnalytics #SparkAISummit
  • 11. Evolution Goal: Increase Flexibility and Velocity ● Requirement 1: Flexibility in model definitions: libraries, frameworks ○ Allow users to define model pipelines (custom Estimator/Transformer) ○ Train and serve those models efficiently ● Requirement 2: Flexibility in Michelangelo use ○ Decouple its monolithic structure into components ○ Allow interoperability with non-Michelangelo components / pipelines ● Requirement 3: Faster / Easier Spark upgrade path ○ Replace custom protobuf model representation ○ Formalize online serving APIs 11#UnifiedAnalytics #SparkAISummit
  • 12. Evolve: Replacing Protobuf Model Representation ● Considered MLeap, PMML, PFA, Spark PipelineModel: all supported in Spark MLlib ○ MLeap: non-standard, impacting interoperability w/ Spark compliant ser/de ○ MLeap, PMML, PFA: Lag in supporting new Spark Transformers ○ MLeap, PMML, PFA: Risk of inconsistent model training/serving behavior ● Wanted to choose Spark PipelineModel representation for Michelangelo models ○ Avoids above shortcomings ○ Provides simple interface for adding estimators/transformers ○ But has challenges in Online Serving (see Pentreath’s Spark Summit 2018 talk) ■ Spark MLlib PipelineModel load latency too large ■ Spark MLlib serving APIs too slow for online serving 12#UnifiedAnalytics #SparkAISummit
  • 13. Spark PipelineModel Representation ● Spark PipelineModel format example file structure ├── 0_strIdx_9ec54829bd7c │ ├── data part-00000-a9f31485-4200-4845-8977-8aec7fa03157.snappy.parquet │ ├── metadata part-00000 ├── 1_strIdx_5547304a5d3d │ ├── data part-00000-163942b9-a194-4023-b477-a5bfba236eb0.snappy.parquet │ ├── metadata part-00000 ├── 2_vecAssembler_29b5569f2d98 │ ├── metadata part-00000 ├── 3_glm_0b885f8f0843 │ ├── data part-00000-0ead8860-f596-475f-96f3-5b10515f075e.snappy.parquet │ └── metadata part-00000 └── 4_idxToStr_968f207b70f2 ├── metadata part-00000 ● Format Read/Written by Spark MLReadable/MLWritable 13#UnifiedAnalytics #SparkAISummit trait MLReadable[T] { def read : org.apache.spark.ml.util.MLReader[T] def load(path : scala.Predef.String) : T } trait MLWritable { def write: org.apache.spark.ml.util.MLWriter def save(path : scala.Predef.String) }
  • 14. Challenge: Spark PipelineModel Load Latency ● Zipped Spark Pipeline and protobuf files were comparable sizes (up to 10s of MBs) ● Spark Pipeline load latency was very high relative to custom protobuf load latency ● Impacts online serving resource agility and health monitoring 14#UnifiedAnalytics #SparkAISummit Pipeline Model Type Spark Pipeline / Protobuf Load GBDT Regression 21.22x GBDT Binary Classification 28.63x Linear Regression 29.94x Logistic Regression 43.97x RF Binary Classification 8.05x RF Regression 12.16x
  • 15. Tuning Load Latency: Part 1 Replaced sc.textfile with local metadata read ● DefaultParamsReadable.load uses sc.textfile ● Forming RDD of strings for small 1-line file was slower than simple load ● Replaced with java I/O for local file case, which was much faster ○ Updated loadMetadata method in mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ● Big reduction in latency of metadata read 15#UnifiedAnalytics #SparkAISummit
  • 16. Tuning Load Latency: Part 2 Replaced sparkSession.read.parquet with ParquetUtil.read ● Spark distributed read/select for small Transformer data was very slow ● Replaced with direct parquet read/getRecord, which was much faster ○ Relevant to Transformers like LogisticRegression, StringIndexer, LinearRegression ● Significant reduction in latency of Transformer data read 16#UnifiedAnalytics #SparkAISummit
  • 17. Tuning Load Latency: Part 3 Updated Tree Ensemble model data save and load to use Parquet directly ● Coalesced tree ensemble node and metadata weights DataFrames at save time to avoid writing large number of small files that are slow to read ● Loading Tree Ensemble models invoked a groupByKey,sortByKey ○ Spark distributed read/select/sort/collect was very slow ● Replaced with direct parquet read/getRecord, which was much faster ● Significant reduction in latency of tree ensemble data read 17#UnifiedAnalytics #SparkAISummit
  • 18. Before and After: Tuned Pipeline Load Latency Greatly improved MLLib load latency, while retaining current on-disk format! 18#UnifiedAnalytics #SparkAISummit Pipeline Model Type Spark Pipeline / Protobuf Load Tuned Spark Pipeline / Protobuf Load GBDT Regression 21.22x 2.05x GBDT Binary Classification 28.63x 2.50x Linear Regression 29.94x 2.03x Logistic Regression 43.97x 2.88x RF Binary Classification 8.05x 3.14x RF Regression 12.16x 3.01x
  • 19. Challenge: SparkContext Cleaner Performance ● Michelangelo online serving creates local SparkContext to handle load of any unoptimized Transformers ● Periodic context cleaner runs induced non-trivial latency in serving request responses ● Solution: Stopped SparkContext when models not actively being loaded. ○ Model load only happens at service startup or when new models are deployed into production online serving 19#UnifiedAnalytics #SparkAISummit
  • 20. Challenge: Serving APIs too slow for online serving ● Added OnlineTransformer trait to Transformers to be served online ○ Single & small list APIs which leverage low-level spark predict methods ○ Injected at Transformer load time, so pipeline models trained outside of Michelangelo can be served online by Michelangelo trait OnlineTransformer { def scoreInstances(instances: List[Map[String, Any]]): List[Map[String, Any]] def scoreInstance(instance: Map[String, Any]): Map[String, Any] } #UnifiedAnalytics #SparkAISummit 20
  • 21. Michelangelo Use of Spark MLlib Evolution Outcome ● Michelangelo is using updated Spark MLlib interface in production ○ Spark PipelineModel on-disk representation ○ Optimized Transformer loads to support online serving ○ OnlineTransformer trait to provide online serving APIs #UnifiedAnalytics #SparkAISummit 21
  • 22. Example Use Cases Enabled by Evolved MA MLlib ● Flexible Pipeline Model Definition ○ Model Pipeline including TFTransformer ● Flexible Use of Michelangelo ○ Train Model in Notebook, Serve Model in Michelangelo 22#UnifiedAnalytics #SparkAISummit
  • 23. Flexible Pipeline Model Definition ● Interoperability with non-Michelangelo components / pipelines ○ Cross framework, system, language support via Estimators / Transformers ● Allow customizability of PipelineModel, Estimators, Transformers while fully integrated into Michelangelo’s Training and Serving infrastructure ○ Combines Spark’s Data Processing with Training using custom libraries e.g. XGBoost, Tensorflow 23#UnifiedAnalytics #SparkAISummit
  • 24. Flexible Pipeline Definition Example: TFTransformer ● Serving TensorFlow Models with TFTransformer https://eng.uber.com/cota-v2/ ○ Spark Pipeline built from training contains both data processing transformers and TensorFlow transformations (TFTransformer) ○ P95 serving latency < 10ms ○ Combines the distributed computation of Spark and low-latency serving using CPUs and the acceleration of DL training using GPUs 24#UnifiedAnalytics #SparkAISummit
  • 25. 25#UnifiedAnalytics #SparkAISummit Serving TF Models using TFTransformer
  • 26. Flexible Use Example: Train in DSW, Serve in MA ● Decouple Michelangelo into functional components ● Consolidate custom data processing, feature engineering, model definition, train, and serve around notebook environments (DSW) 26#UnifiedAnalytics #SparkAISummit
  • 28. Key Learnings in Evolving Michelangelo ● Pipeline representation of models is powerful ○ Encodes all steps in operational modeling ○ Enforces consistency between training and serving ● Pipeline representation of models needs to be flexible ○ Model pipeline can encapsulate complex stages ○ Complexity stems from differing workflow and user needs 28#UnifiedAnalytics #SparkAISummit
  • 29. Conclusion ● Michelangelo updated use of Spark MLlib is working well in production ● Propose to open source our changes to Spark MLlib ○ Submitted Spark MLlib Online Serving SPIP ■ https://issues.apache.org/jira/browse/SPARK-26247 ○ Posted 2 patches ■ Patch to reduce spark pipeline load latency ■ Patch to add OnlineTransformer trait for online serving APIs 29#UnifiedAnalytics #SparkAISummit
  • 32. WIFI SSID:SparkAISummit | Password: UnifiedAnalytics