SlideShare a Scribd company logo
1 of 49
Download to read offline
HIGH PERFORMANCE MODEL SERVING WITH
KUBERNETES AND ISTIO…
…AND AWS SAGEMAKER, GOOGLE CLOUD ML,
AZURE ML!
CHRIS FREGLY
FOUNDER @ PIPELINE.AI
RECENT PIPELINE.AI NEWS
Sept 2017
Dec 2017
INTRODUCTIONS: ME
§ Chris Fregly, Founder & Engineer @PipelineAI
§ Formerly Netflix, Databricks, IBM Spark Tech
§ Advanced Spark and TensorFlow Meetup
§ Please Join Our 60,000+ Global Members!!
Contact Me
chris@pipeline.ai
@cfregly
Global Locations
* San Francisco
* Chicago
* Austin
* Washington DC
* Dusseldorf
* London
INTRODUCTIONS: YOU
§ Software Engineer, DevOps Engineer, Data {Scientist, Engineer}
§ Interested in Optimizing and Deploying TF Models to Production
§ Nice to Have a Working Knowledge of TensorFlow (Not Required)
PIPELINE.AI IS 100% OPEN SOURCE
§ https://github.com/PipelineAI/pipeline/
§ Please Star 🌟 this GitHub Repo!
§ Some VC’s Value GitHub Stars @ $1,500 Each (?!)
PIPELINE.AI OVERVIEW
450,000 Docker Downloads
60,000 Users Registered for GA
60,000 Meetup Members
40,000 LinkedIn Followers
2,200 GitHub Stars
12 Enterprise Beta Users
WHY HEAVY FOCUS ON MODEL SERVING?
Model Training
Batch & Boring
Offline in Research Lab
Pipeline Ends at Training
No Insight into Live Production
Small Number of Data Scientists
Optimizations Very Well-Known
Real-Time & Exciting!!
Online in Live Production
Pipeline Extends into Production
Continuous Insight into Live Production
Huuuuuuge Number of Application Users
**Many Optimizations Not Yet Utilized
<<<
Model Serving
100’s Training Jobs per Day 1,000,000’s Predictions per Sec
AGENDA
Part 0: Latest PipelineAI Research
Part 1: PipelineAI + Kubernetes + Istio
AGENDA
Part 0: Latest PipelineAI Research
§ Deploy, Tune Models + Runtimes Safely in Prod
§ Compare Models Both Offline and Online
§ Auto-Shift Traffic to Winning Model or Cloud
§ Live, Continuous Model Training in Production
PACKAGE MODEL + RUNTIME AS ONE
§ Build Model with Runtime into Immutable Docker Image
§ Emphasize Immutable Deployment and Infrastructure
§ Same Runtime Dependencies in All Environments
§ Local, Development, Staging, Production
§ No Library or Dependency Surprises
§ Deploy and Tune Model + Runtime Together
pipeline predict-server-build --model-type=tensorflow 
--model-name=mnist 
--model-tag=A 
--model-path=./models/tensorflow/mnist/
Build Local
Model Server A
LOAD TEST LOCAL MODEL + RUNTIME
§ Perform Mini-Load Test on Local Model Server
§ Immediate, Local Prediction Performance Metrics
§ Compare to Previous Model + Runtime Variations
pipeline predict-server-start --model-type=tensorflow 
--model-name=mnist 
--model-tag=A
pipeline predict --model-endpoint-url=http://localhost:8080 
--test-request-path=test_request.json 
--test-request-concurrency=1000
Load Test Local
Model Server A
Start Local
Model Server A
PUSH IMAGE TO DOCKER REGISTRY
§ Supports All Public + Private Docker Registries
§ DockerHub, Artifactory, Quay, AWS, Google, …
§ Or Self-Hosted, Private Docker Registry
pipeline predict-server-push --image-registry-url=<your-registry> 
--image-registry-repo=<your-repo> 
--model-type=tensorflow 
--model-name=mnist 
--model-tag=A
Push Image To
Docker Registry
CLOUD-BASED OPTIONS
§ AWS SageMaker
§ Released Nov 2017 @ Re-invent
§ Custom Docker Images for Training/Serving (ie. PipelineAI Images)
§ Distributed TensorFlow Training through Estimator API
§ Traffic Splitting for A/B Model Testing
§ Google Cloud ML Engine
§ Mostly Command-Line Based
§ Driving TensorFlow Open Source API (ie. Experiment API)
§ Azure ML
TUNE MODEL + RUNTIME AS SINGLE UNIT
§ Model Training Optimizations
§ Model Hyper-Parameters (ie. Learning Rate)
§ Reduced Precision (ie. FP16 Half Precision)
§ Post-Training Model Optimizations
§ Quantize Model Weights + Activations From 32-bit to 8-bit
§ Fuse Neural Network Layers Together
§ Model Runtime Optimizations
§ Runtime Configs (ie. Request Batch Size)
§ Different Runtimes (ie. TensorFlow Lite, Nvidia TensorRT)
POST-TRAINING OPTIMIZATIONS
§ Prepare Model for Serving
§ Simplify Network
§ Reduce Model Size
§ Lower Precision for Fast Math
§ Some Tools
§ Graph Transform Tool (GTT)
§ tfcompile
After Training
After
Optimizing!
pipeline optimize --optimization-list=[quantize_weights, tfcompile] 
--model-type=tensorflow 
--model-name=mnist 
--model-tag=A 
--model-path=./tensorflow/mnist/model 
--output-path=./tensorflow/mnist/optimized_model
Linear
Regression
RUNTIME OPTION: TENSORFLOW LITE
§ Post-Training Model Optimizations
§ Currently Supports iOS and Android
§ On-Device Prediction Runtime
§ Low-Latency, Fast Startup
§ Selective Operator Loading
§ 70KB Min - 300KB Max Runtime Footprint
§ Supports Accelerators (GPU, TPU)
§ Falls Back to CPU without Accelerator
§ Java and C++ APIs
RUNTIME OPTION: NVIDIA TENSOR-RT
§ Post-Training Model Optimizations
§ Specific to Nvidia GPU
§ GPU-Optimized Prediction Runtime
§ Alternative to TensorFlow Serving
§ PipelineAI Supports TensorRT!
DEPLOY MODELS SAFELY TO PROD
§ Deploy from CLI or Jupyter Notebook
§ Tear-Down or Rollback Models Quickly
§ Shadow Canary Deploy: ie.20% Live Traffic
§ Split Canary Deploy: ie. 97-2-1% Live Traffic
pipeline predict-cluster-start --model-runtime=tflite 
--model-type=tensorflow 
--model-name=mnist 
--model-tag=B 
--traffic-split=2
Start Production
Model Cluster B
pipeline predict-cluster-start --model-runtime=tensorrt 
--model-type=tensorflow 
--model-name=mnist 
--model-tag=C 
--traffic-split=1
Start Production
Model Cluster C
pipeline predict-cluster-start --model-runtime=tfserving_gpu 
--model-type=tensorflow 
--model-name=mnist 
--model-tag=A 
--traffic-split=97
Start Production
Model Cluster A
AGENDA
Part 0: Latest PipelineAI Research
§ Deploy, Tune Models + Runtimes Safely in Prod
§ Compare Models Both Offline and Online
§ Auto-Shift Traffic to Winning Model or Cloud
§ Live, Continuous Model Training in Production
COMPARE MODELS OFFLINE & ONLINE
§ Offline, Batch Metrics
§ Validation + Training Accuracy
§ CPU + GPU Utilization
§ Live Prediction Values
§ Compare Relative Precision
§ Newly-Seen, Streaming Data
§ Online, Real-Time Metrics
§ Response Time, Throughput
§ Cost ($) Per Prediction
VIEW REAL-TIME PREDICTION STREAM
§ Visually Compare Real-Time Predictions
Prediction
Inputs
Prediction
Results &
Confidences
Model B Model CModel A
PREDICTION PROFILING AND TUNING
§ Pinpoint Performance Bottlenecks
§ Fine-Grained Prediction Metrics
§ 3 Steps in Real-Time Prediction
1. transform_request()
2. predict()
3. transform_response()
AGENDA
Part 0: Latest PipelineAI Research
§ Deploy, Tune Models + Runtimes Safely in Prod
§ Compare Models Both Offline and Online
§ Auto-Shift Traffic to Winning Model or Cloud
§ Live, Continuous Model Training in Production
LIVE, ADAPTIVE TRAFFIC ROUTING
§ A/B Tests
§ Inflexible and Boring
§ Multi-Armed Bandits
§ Adaptive and Exciting!
pipeline traffic-router-split --model-type=tensorflow 
--model-name=mnist 
--model-tag-list=[A,B,C] 
--model-weight-list=[1,2,97]
Adjust
Traffic Routing
Dynamically
SHIFT TRAFFIC TO MAX(REVENUE)
§ Shift Traffic to Winning Model using AI Bandit Algos
SHIFT TRAFFIC TO MIN(CLOUD CO$T)
§ Based on Cost ($) Per Prediction
§ Cost Changes Throughout Day
§ Lose AWS Spot Instances
§ Google Cloud Becomes Cheaper
§ Shift Across Clouds & On-Prem
AGENDA
Part 0: Latest PipelineAI Research
§ Deploy, Tune Models + Runtimes Safely in Prod
§ Compare Models Both Offline and Online
§ Auto-Shift Traffic to Winning Model or Cloud
§ Live, Continuous Model Training in Production
LIVE, CONTINUOUS MODEL TRAINING
§ The Holy Grail of Machine Learning
§ Q1 2018: PipelineAI Supports Continuous Model Training!
§ Kafka, Kinesis
§ Spark Streaming
PSEUDO-CONTINUOUS TRAINING
§ Identify and Fix Borderline Predictions (~50-50% Confidence)
§ Fix Along Class Boundaries
§ Retrain Newly-Labeled Data
§ Game-ify Labeling Process
§ Enable Crowd Sourcing
DEMOS!!
§ https://github.com/PipelineAI/pipeline/
§ Please Star 🌟 this GitHub Repo!
AGENDA
Part 0: Latest PipelineAI Research
Part 1: PipelineAI + Kubernetes + Istio
SPECIAL THANKS TO CHRISTIAN POSTA
§ http://blog.christianposta.com/istio-workshop/slides/
KUBERNETES INGRESS
§ Single Service
§ Can also use Service (LoadBalancer or NodePort)
§ Fan Out & Name-Based Virtual Hosting
§ Route Traffic Using Path or Host Header
§ Reduces # of load balancers needed
§ 404 Implemented as default backend
§ Federation / Hybrid-Cloud
§ Creates Ingress objects in every cluster
§ Monitors health and capacity of pods within each cluster
§ Routes clients to appropriate backend anywhere in federation
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: gateway-fanout
annotations:
kubernetes.io/ingress.class: istio
spec:
rules:
- host: foo.bar.com
http:
paths:
- path: /foo
backend:
serviceName: s1
servicePort: 80
- path: /bar
backend:
serviceName: s2
servicePort: 80
Fan Out (Path)
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: gateway-virtualhost
annotations:
kubernetes.io/ingress.class: istio
spec:
rules:
- host: foo.bar.com
http:
paths:
backend:
serviceName: s1
servicePort: 80
- host: bar.foo.com
http:
paths:
backend:
serviceName: s2
servicePort: 80
Virtual Hosting
KUBERNETES INGRESS CONTROLLER
§ Ingress Controller Types
§ Google Cloud: kubernetes.io/ingress.class: gce
§ Nginx: kubernetes.io/ingress.class: nginx
§ Istio: kubernetes.io/ingress.class: istio
§ Must Start Ingress Controller Manually
§ Just deploying Ingress is not enough
§ Not started by kube-controller-manager
§ Start Istio Ingress Controller kubectl apply -f 
$ISTIO_INSTALL_PATH/install/kubernetes/istio.yaml
ISTIO ARCHITECTURE: ENVOY
§ Lyft Project
§ High-perf Proxy (C++)
§ Lots of Metrics
§ Zone-Aware
§ Service Discovery
§ Load Balancing
§ Fault Injection, Circuits
§ %-based Traffic Split, Shadow
§ Sidecar Pattern
§ Rate Limiting, Retries, Outlier Detection, Timeout with Budget, …
ISTIO ARCHITECTURE: MIXER
§ Enforce Access Control
§ Evaluate Request-Attrs
§ Collect Metrics
§ Platform-Independent
§ Extensible Plugin Model
ISTIO ARCHITECTURE: PILOT
§ Envoy service discovery
§ Intelligent routing
§ A/B Tests
§ Canary deployments
§ RouteRule->Envoy conf
§ Propagates to sidecars
§ Supports Kube, Consul, ...
ISTIO ARCHITECTURE: ISTIO-AUTH
§ Mutual TLS Auth
§ Credential management
§ Uses Service-identity
§ Canary deployments
§ Fine-grained ACLs
§ Attribute & role-based
§ Auditing & monitoring
ISTIO ROUTE RULES
§ Kubernetes Custom Resource Definition (CRD)
kind: CustomResourceDefinition
metadata:
name: routerules.config.istio.io
spec:
group: config.istio.io
names:
kind: RouteRule
listKind: RouteRuleList
plural: routerules
singular: routerule
scope: Namespaced
version: v1alpha2
A/B & BANDIT MODEL TESTING
§ Live Experiments in Production
§ Compare Existing Model A with Model B, Model C
§ Safe Split-Canary Deployment
§ Tip: Keep Ingress Simple – Use Route Rules Instead!
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
name: live-experiment-20-5-75
spec:
destination:
name: predict-mnist
precedence: 2 # Greater than global deny-all
route:
- labels:
version: A
weight: 20 # 20% still routes to model A
- labels:
version: B # 5% routes to new model B
weight: 5
- labels:
version: C # 75% routes to new model C
weight: 75
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
name: live-experiment-1-2-97
spec:
destination:
name: predict-mnist
precedence: 2 # Greater than global deny-all
route:
- labels:
version: A
weight: 1 # 1% routes to model A
- labels:
version: B # 2% routes to new model B
weight: 2
- labels:
version: C # 97% routes to new model C
weight: 97
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
name: live-experiment-97-2-1
spec:
destination:
name: predict-mnist
precedence: 2 # Greater than global deny-all
route:
- labels:
version: A
weight: 97 # 97% still routes to model A
- labels:
version: B # 2% routes to new model B
weight: 2
- labels:
version: C # 1% routes to new model C
weight: 1
ISTIO AUTO-SCALING
§ Traffic Routing and Auto-Scaling Occur Independently
§ Istio Continues to Obey Traffic Splits After Auto-Scaling
§ Auto-Scaling May Occur In Response to New Traffic Route
ADVANCED ROUTING RULES
§ Content-based Routing
§ Uses headers, username, payload, …
§ Cross-Environment Routing
§ Shadow traffic prod => staging
ISTIO DESTINATION POLICIES
§ Load Balancing
§ ROUND_ROBIN (default)
§ LEAST_CONN (between 2 randomly-selected hosts)
§ RANDOM
§ Circuit Breaker
§ Max connections
§ Max requests per conn
§ Consecutive errors
§ Penalty timer (15 mins)
§ Scan windows (5 mins)
circuitBreaker:
simpleCb:
maxConnections: 100
httpMaxRequests: 1000
httpMaxRequestsPerConnection: 10
httpConsecutiveErrors: 7
sleepWindow: 15m
httpDetectionInterval: 5m
ISTIO EGRESS
§ Whilelisted Domains Accessible Within Service Mesh
§ Apply RoutingRules and DestinationPolicys
§ Supports TLS, HTTP, GRPC kind: EgressRule
metadata:
name: foo-egress-rule
spec:
destination:
service: api.pipeline.ai
ports:
- port: 80
protocol: http
- port: 443
protocol: https
ISTIO & CHAOS + LATENCY MONKIES
§ Fault Injection
§ Delay
§ Abort
kind: RouteRule
metadata:
name: predict-mnist
spec:
destination:
name: predict-mnist
httpFault:
abort:
httpStatus: 420
percent: 100
kind: RouteRule
metadata:
name: predict-mnist
spec:
destination:
name: predict-mnist
httpFault:
delay:
fixedDelay: 7.000s
percent: 100
ISTIO METRICS AND MONITORING
§ Verify Traffic Splits
§ Fine-Grained Request Tracing
ISTIO SECURITY
§ Istio Certificate Authority
§ Mutual TLS
AGENDA
Part 0: Latest PipelineAI Research
Part 1: PipelineAI + Kubernetes + Istio
THANK YOU!!
§ https://github.com/PipelineAI/pipeline/
§ Please Star 🌟 this GitHub Repo!
§ Reminder: VC’s Value GitHub Stars @ $1,500 Each (!!)
Contact Me
chris@pipeline.ai
@cfregly

More Related Content

What's hot

Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Chris Fregly
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Chris Fregly
 
Quest for the Perfect Workflow for McrFRED
Quest for the Perfect Workflow for McrFREDQuest for the Perfect Workflow for McrFRED
Quest for the Perfect Workflow for McrFRED
Andi Smith
 
London devops logging
London devops loggingLondon devops logging
London devops logging
Tomas Doran
 
Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails Applications
Serge Smetana
 
Rails Application Optimization Techniques & Tools
Rails Application Optimization Techniques & ToolsRails Application Optimization Techniques & Tools
Rails Application Optimization Techniques & Tools
guest05c09d
 

What's hot (20)

Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...
Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...
Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...
 
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
 
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...
 
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
 
Building Google Cloud ML Engine From Scratch on AWS with PipelineAI - ODSC Lo...
Building Google Cloud ML Engine From Scratch on AWS with PipelineAI - ODSC Lo...Building Google Cloud ML Engine From Scratch on AWS with PipelineAI - ODSC Lo...
Building Google Cloud ML Engine From Scratch on AWS with PipelineAI - ODSC Lo...
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
 
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
 
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUsOptimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
 
High performance network programming on the jvm oscon 2012
High performance network programming on the jvm   oscon 2012 High performance network programming on the jvm   oscon 2012
High performance network programming on the jvm oscon 2012
 
Quest for the Perfect Workflow for McrFRED
Quest for the Perfect Workflow for McrFREDQuest for the Perfect Workflow for McrFRED
Quest for the Perfect Workflow for McrFRED
 
London devops logging
London devops loggingLondon devops logging
London devops logging
 
Introduction to Polyaxon
Introduction to PolyaxonIntroduction to Polyaxon
Introduction to Polyaxon
 
Mасштабирование микросервисов на Go, Matt Heath (Hailo)
Mасштабирование микросервисов на Go, Matt Heath (Hailo)Mасштабирование микросервисов на Go, Matt Heath (Hailo)
Mасштабирование микросервисов на Go, Matt Heath (Hailo)
 
Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013
Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013
Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013
 
Scaling Django
Scaling DjangoScaling Django
Scaling Django
 
Optimizing Application Performance on Kubernetes
Optimizing Application Performance on KubernetesOptimizing Application Performance on Kubernetes
Optimizing Application Performance on Kubernetes
 
Dev ops for developers
Dev ops for developersDev ops for developers
Dev ops for developers
 
Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails Applications
 
Rails Application Optimization Techniques & Tools
Rails Application Optimization Techniques & ToolsRails Application Optimization Techniques & Tools
Rails Application Optimization Techniques & Tools
 
Drupal Efficiency
Drupal EfficiencyDrupal Efficiency
Drupal Efficiency
 

Similar to PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker + Google Cloud ML + Azure ML

High Performance Distributed TensorFlow with GPUs and Kubernetes
High Performance Distributed TensorFlow with GPUs and KubernetesHigh Performance Distributed TensorFlow with GPUs and Kubernetes
High Performance Distributed TensorFlow with GPUs and Kubernetes
inside-BigData.com
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lightbend
 

Similar to PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker + Google Cloud ML + Azure ML (20)

Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AIOptimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
 
High Performance Distributed TensorFlow with GPUs and Kubernetes
High Performance Distributed TensorFlow with GPUs and KubernetesHigh Performance Distributed TensorFlow with GPUs and Kubernetes
High Performance Distributed TensorFlow with GPUs and Kubernetes
 
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsTensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
 
The Convergence of HPC and Deep Learning
The Convergence of HPC and Deep LearningThe Convergence of HPC and Deep Learning
The Convergence of HPC and Deep Learning
 
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
TIAD : Automating the modern datacenter
TIAD : Automating the modern datacenterTIAD : Automating the modern datacenter
TIAD : Automating the modern datacenter
 
Apache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextApache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's Next
 
Manila, an update from Liberty, OpenStack Summit - Tokyo
Manila, an update from Liberty, OpenStack Summit - TokyoManila, an update from Liberty, OpenStack Summit - Tokyo
Manila, an update from Liberty, OpenStack Summit - Tokyo
 
Using Databases and Containers From Development to Deployment
Using Databases and Containers  From Development to DeploymentUsing Databases and Containers  From Development to Deployment
Using Databases and Containers From Development to Deployment
 
SD Times - Docker v2
SD Times - Docker v2SD Times - Docker v2
SD Times - Docker v2
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
 
GraphQL vs. (the) REST
GraphQL vs. (the) RESTGraphQL vs. (the) REST
GraphQL vs. (the) REST
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
Integrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache AirflowIntegrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache Airflow
 

More from Chris Fregly

Amazon reInvent 2020 Recap: AI and Machine Learning
Amazon reInvent 2020 Recap:  AI and Machine LearningAmazon reInvent 2020 Recap:  AI and Machine Learning
Amazon reInvent 2020 Recap: AI and Machine Learning
Chris Fregly
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
Chris Fregly
 
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Chris Fregly
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Chris Fregly
 
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Chris Fregly
 

More from Chris Fregly (16)

AWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and DataAWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and Data
 
Pandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdfPandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdf
 
Ray AI Runtime (AIR) on AWS - Data Science On AWS Meetup
Ray AI Runtime (AIR) on AWS - Data Science On AWS MeetupRay AI Runtime (AIR) on AWS - Data Science On AWS Meetup
Ray AI Runtime (AIR) on AWS - Data Science On AWS Meetup
 
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds UpdatedSmokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
 
Amazon reInvent 2020 Recap: AI and Machine Learning
Amazon reInvent 2020 Recap:  AI and Machine LearningAmazon reInvent 2020 Recap:  AI and Machine Learning
Amazon reInvent 2020 Recap: AI and Machine Learning
 
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
 
Quantum Computing with Amazon Braket
Quantum Computing with Amazon BraketQuantum Computing with Amazon Braket
Quantum Computing with Amazon Braket
 
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
 
AWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:CapAWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:Cap
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
 
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
 
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
 
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
 
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
 

Recently uploaded

%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Recently uploaded (20)

WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 

PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker + Google Cloud ML + Azure ML

  • 1. HIGH PERFORMANCE MODEL SERVING WITH KUBERNETES AND ISTIO… …AND AWS SAGEMAKER, GOOGLE CLOUD ML, AZURE ML! CHRIS FREGLY FOUNDER @ PIPELINE.AI
  • 3. INTRODUCTIONS: ME § Chris Fregly, Founder & Engineer @PipelineAI § Formerly Netflix, Databricks, IBM Spark Tech § Advanced Spark and TensorFlow Meetup § Please Join Our 60,000+ Global Members!! Contact Me chris@pipeline.ai @cfregly Global Locations * San Francisco * Chicago * Austin * Washington DC * Dusseldorf * London
  • 4. INTRODUCTIONS: YOU § Software Engineer, DevOps Engineer, Data {Scientist, Engineer} § Interested in Optimizing and Deploying TF Models to Production § Nice to Have a Working Knowledge of TensorFlow (Not Required)
  • 5. PIPELINE.AI IS 100% OPEN SOURCE § https://github.com/PipelineAI/pipeline/ § Please Star 🌟 this GitHub Repo! § Some VC’s Value GitHub Stars @ $1,500 Each (?!)
  • 6. PIPELINE.AI OVERVIEW 450,000 Docker Downloads 60,000 Users Registered for GA 60,000 Meetup Members 40,000 LinkedIn Followers 2,200 GitHub Stars 12 Enterprise Beta Users
  • 7. WHY HEAVY FOCUS ON MODEL SERVING? Model Training Batch & Boring Offline in Research Lab Pipeline Ends at Training No Insight into Live Production Small Number of Data Scientists Optimizations Very Well-Known Real-Time & Exciting!! Online in Live Production Pipeline Extends into Production Continuous Insight into Live Production Huuuuuuge Number of Application Users **Many Optimizations Not Yet Utilized <<< Model Serving 100’s Training Jobs per Day 1,000,000’s Predictions per Sec
  • 8. AGENDA Part 0: Latest PipelineAI Research Part 1: PipelineAI + Kubernetes + Istio
  • 9. AGENDA Part 0: Latest PipelineAI Research § Deploy, Tune Models + Runtimes Safely in Prod § Compare Models Both Offline and Online § Auto-Shift Traffic to Winning Model or Cloud § Live, Continuous Model Training in Production
  • 10. PACKAGE MODEL + RUNTIME AS ONE § Build Model with Runtime into Immutable Docker Image § Emphasize Immutable Deployment and Infrastructure § Same Runtime Dependencies in All Environments § Local, Development, Staging, Production § No Library or Dependency Surprises § Deploy and Tune Model + Runtime Together pipeline predict-server-build --model-type=tensorflow --model-name=mnist --model-tag=A --model-path=./models/tensorflow/mnist/ Build Local Model Server A
  • 11. LOAD TEST LOCAL MODEL + RUNTIME § Perform Mini-Load Test on Local Model Server § Immediate, Local Prediction Performance Metrics § Compare to Previous Model + Runtime Variations pipeline predict-server-start --model-type=tensorflow --model-name=mnist --model-tag=A pipeline predict --model-endpoint-url=http://localhost:8080 --test-request-path=test_request.json --test-request-concurrency=1000 Load Test Local Model Server A Start Local Model Server A
  • 12. PUSH IMAGE TO DOCKER REGISTRY § Supports All Public + Private Docker Registries § DockerHub, Artifactory, Quay, AWS, Google, … § Or Self-Hosted, Private Docker Registry pipeline predict-server-push --image-registry-url=<your-registry> --image-registry-repo=<your-repo> --model-type=tensorflow --model-name=mnist --model-tag=A Push Image To Docker Registry
  • 13. CLOUD-BASED OPTIONS § AWS SageMaker § Released Nov 2017 @ Re-invent § Custom Docker Images for Training/Serving (ie. PipelineAI Images) § Distributed TensorFlow Training through Estimator API § Traffic Splitting for A/B Model Testing § Google Cloud ML Engine § Mostly Command-Line Based § Driving TensorFlow Open Source API (ie. Experiment API) § Azure ML
  • 14. TUNE MODEL + RUNTIME AS SINGLE UNIT § Model Training Optimizations § Model Hyper-Parameters (ie. Learning Rate) § Reduced Precision (ie. FP16 Half Precision) § Post-Training Model Optimizations § Quantize Model Weights + Activations From 32-bit to 8-bit § Fuse Neural Network Layers Together § Model Runtime Optimizations § Runtime Configs (ie. Request Batch Size) § Different Runtimes (ie. TensorFlow Lite, Nvidia TensorRT)
  • 15. POST-TRAINING OPTIMIZATIONS § Prepare Model for Serving § Simplify Network § Reduce Model Size § Lower Precision for Fast Math § Some Tools § Graph Transform Tool (GTT) § tfcompile After Training After Optimizing! pipeline optimize --optimization-list=[quantize_weights, tfcompile] --model-type=tensorflow --model-name=mnist --model-tag=A --model-path=./tensorflow/mnist/model --output-path=./tensorflow/mnist/optimized_model Linear Regression
  • 16. RUNTIME OPTION: TENSORFLOW LITE § Post-Training Model Optimizations § Currently Supports iOS and Android § On-Device Prediction Runtime § Low-Latency, Fast Startup § Selective Operator Loading § 70KB Min - 300KB Max Runtime Footprint § Supports Accelerators (GPU, TPU) § Falls Back to CPU without Accelerator § Java and C++ APIs
  • 17. RUNTIME OPTION: NVIDIA TENSOR-RT § Post-Training Model Optimizations § Specific to Nvidia GPU § GPU-Optimized Prediction Runtime § Alternative to TensorFlow Serving § PipelineAI Supports TensorRT!
  • 18. DEPLOY MODELS SAFELY TO PROD § Deploy from CLI or Jupyter Notebook § Tear-Down or Rollback Models Quickly § Shadow Canary Deploy: ie.20% Live Traffic § Split Canary Deploy: ie. 97-2-1% Live Traffic pipeline predict-cluster-start --model-runtime=tflite --model-type=tensorflow --model-name=mnist --model-tag=B --traffic-split=2 Start Production Model Cluster B pipeline predict-cluster-start --model-runtime=tensorrt --model-type=tensorflow --model-name=mnist --model-tag=C --traffic-split=1 Start Production Model Cluster C pipeline predict-cluster-start --model-runtime=tfserving_gpu --model-type=tensorflow --model-name=mnist --model-tag=A --traffic-split=97 Start Production Model Cluster A
  • 19. AGENDA Part 0: Latest PipelineAI Research § Deploy, Tune Models + Runtimes Safely in Prod § Compare Models Both Offline and Online § Auto-Shift Traffic to Winning Model or Cloud § Live, Continuous Model Training in Production
  • 20. COMPARE MODELS OFFLINE & ONLINE § Offline, Batch Metrics § Validation + Training Accuracy § CPU + GPU Utilization § Live Prediction Values § Compare Relative Precision § Newly-Seen, Streaming Data § Online, Real-Time Metrics § Response Time, Throughput § Cost ($) Per Prediction
  • 21. VIEW REAL-TIME PREDICTION STREAM § Visually Compare Real-Time Predictions Prediction Inputs Prediction Results & Confidences Model B Model CModel A
  • 22. PREDICTION PROFILING AND TUNING § Pinpoint Performance Bottlenecks § Fine-Grained Prediction Metrics § 3 Steps in Real-Time Prediction 1. transform_request() 2. predict() 3. transform_response()
  • 23. AGENDA Part 0: Latest PipelineAI Research § Deploy, Tune Models + Runtimes Safely in Prod § Compare Models Both Offline and Online § Auto-Shift Traffic to Winning Model or Cloud § Live, Continuous Model Training in Production
  • 24. LIVE, ADAPTIVE TRAFFIC ROUTING § A/B Tests § Inflexible and Boring § Multi-Armed Bandits § Adaptive and Exciting! pipeline traffic-router-split --model-type=tensorflow --model-name=mnist --model-tag-list=[A,B,C] --model-weight-list=[1,2,97] Adjust Traffic Routing Dynamically
  • 25. SHIFT TRAFFIC TO MAX(REVENUE) § Shift Traffic to Winning Model using AI Bandit Algos
  • 26. SHIFT TRAFFIC TO MIN(CLOUD CO$T) § Based on Cost ($) Per Prediction § Cost Changes Throughout Day § Lose AWS Spot Instances § Google Cloud Becomes Cheaper § Shift Across Clouds & On-Prem
  • 27. AGENDA Part 0: Latest PipelineAI Research § Deploy, Tune Models + Runtimes Safely in Prod § Compare Models Both Offline and Online § Auto-Shift Traffic to Winning Model or Cloud § Live, Continuous Model Training in Production
  • 28. LIVE, CONTINUOUS MODEL TRAINING § The Holy Grail of Machine Learning § Q1 2018: PipelineAI Supports Continuous Model Training! § Kafka, Kinesis § Spark Streaming
  • 29. PSEUDO-CONTINUOUS TRAINING § Identify and Fix Borderline Predictions (~50-50% Confidence) § Fix Along Class Boundaries § Retrain Newly-Labeled Data § Game-ify Labeling Process § Enable Crowd Sourcing
  • 31. AGENDA Part 0: Latest PipelineAI Research Part 1: PipelineAI + Kubernetes + Istio
  • 32. SPECIAL THANKS TO CHRISTIAN POSTA § http://blog.christianposta.com/istio-workshop/slides/
  • 33. KUBERNETES INGRESS § Single Service § Can also use Service (LoadBalancer or NodePort) § Fan Out & Name-Based Virtual Hosting § Route Traffic Using Path or Host Header § Reduces # of load balancers needed § 404 Implemented as default backend § Federation / Hybrid-Cloud § Creates Ingress objects in every cluster § Monitors health and capacity of pods within each cluster § Routes clients to appropriate backend anywhere in federation apiVersion: extensions/v1beta1 kind: Ingress metadata: name: gateway-fanout annotations: kubernetes.io/ingress.class: istio spec: rules: - host: foo.bar.com http: paths: - path: /foo backend: serviceName: s1 servicePort: 80 - path: /bar backend: serviceName: s2 servicePort: 80 Fan Out (Path) apiVersion: extensions/v1beta1 kind: Ingress metadata: name: gateway-virtualhost annotations: kubernetes.io/ingress.class: istio spec: rules: - host: foo.bar.com http: paths: backend: serviceName: s1 servicePort: 80 - host: bar.foo.com http: paths: backend: serviceName: s2 servicePort: 80 Virtual Hosting
  • 34. KUBERNETES INGRESS CONTROLLER § Ingress Controller Types § Google Cloud: kubernetes.io/ingress.class: gce § Nginx: kubernetes.io/ingress.class: nginx § Istio: kubernetes.io/ingress.class: istio § Must Start Ingress Controller Manually § Just deploying Ingress is not enough § Not started by kube-controller-manager § Start Istio Ingress Controller kubectl apply -f $ISTIO_INSTALL_PATH/install/kubernetes/istio.yaml
  • 35. ISTIO ARCHITECTURE: ENVOY § Lyft Project § High-perf Proxy (C++) § Lots of Metrics § Zone-Aware § Service Discovery § Load Balancing § Fault Injection, Circuits § %-based Traffic Split, Shadow § Sidecar Pattern § Rate Limiting, Retries, Outlier Detection, Timeout with Budget, …
  • 36. ISTIO ARCHITECTURE: MIXER § Enforce Access Control § Evaluate Request-Attrs § Collect Metrics § Platform-Independent § Extensible Plugin Model
  • 37. ISTIO ARCHITECTURE: PILOT § Envoy service discovery § Intelligent routing § A/B Tests § Canary deployments § RouteRule->Envoy conf § Propagates to sidecars § Supports Kube, Consul, ...
  • 38. ISTIO ARCHITECTURE: ISTIO-AUTH § Mutual TLS Auth § Credential management § Uses Service-identity § Canary deployments § Fine-grained ACLs § Attribute & role-based § Auditing & monitoring
  • 39. ISTIO ROUTE RULES § Kubernetes Custom Resource Definition (CRD) kind: CustomResourceDefinition metadata: name: routerules.config.istio.io spec: group: config.istio.io names: kind: RouteRule listKind: RouteRuleList plural: routerules singular: routerule scope: Namespaced version: v1alpha2
  • 40. A/B & BANDIT MODEL TESTING § Live Experiments in Production § Compare Existing Model A with Model B, Model C § Safe Split-Canary Deployment § Tip: Keep Ingress Simple – Use Route Rules Instead! apiVersion: config.istio.io/v1alpha2 kind: RouteRule metadata: name: live-experiment-20-5-75 spec: destination: name: predict-mnist precedence: 2 # Greater than global deny-all route: - labels: version: A weight: 20 # 20% still routes to model A - labels: version: B # 5% routes to new model B weight: 5 - labels: version: C # 75% routes to new model C weight: 75 apiVersion: config.istio.io/v1alpha2 kind: RouteRule metadata: name: live-experiment-1-2-97 spec: destination: name: predict-mnist precedence: 2 # Greater than global deny-all route: - labels: version: A weight: 1 # 1% routes to model A - labels: version: B # 2% routes to new model B weight: 2 - labels: version: C # 97% routes to new model C weight: 97 apiVersion: config.istio.io/v1alpha2 kind: RouteRule metadata: name: live-experiment-97-2-1 spec: destination: name: predict-mnist precedence: 2 # Greater than global deny-all route: - labels: version: A weight: 97 # 97% still routes to model A - labels: version: B # 2% routes to new model B weight: 2 - labels: version: C # 1% routes to new model C weight: 1
  • 41. ISTIO AUTO-SCALING § Traffic Routing and Auto-Scaling Occur Independently § Istio Continues to Obey Traffic Splits After Auto-Scaling § Auto-Scaling May Occur In Response to New Traffic Route
  • 42. ADVANCED ROUTING RULES § Content-based Routing § Uses headers, username, payload, … § Cross-Environment Routing § Shadow traffic prod => staging
  • 43. ISTIO DESTINATION POLICIES § Load Balancing § ROUND_ROBIN (default) § LEAST_CONN (between 2 randomly-selected hosts) § RANDOM § Circuit Breaker § Max connections § Max requests per conn § Consecutive errors § Penalty timer (15 mins) § Scan windows (5 mins) circuitBreaker: simpleCb: maxConnections: 100 httpMaxRequests: 1000 httpMaxRequestsPerConnection: 10 httpConsecutiveErrors: 7 sleepWindow: 15m httpDetectionInterval: 5m
  • 44. ISTIO EGRESS § Whilelisted Domains Accessible Within Service Mesh § Apply RoutingRules and DestinationPolicys § Supports TLS, HTTP, GRPC kind: EgressRule metadata: name: foo-egress-rule spec: destination: service: api.pipeline.ai ports: - port: 80 protocol: http - port: 443 protocol: https
  • 45. ISTIO & CHAOS + LATENCY MONKIES § Fault Injection § Delay § Abort kind: RouteRule metadata: name: predict-mnist spec: destination: name: predict-mnist httpFault: abort: httpStatus: 420 percent: 100 kind: RouteRule metadata: name: predict-mnist spec: destination: name: predict-mnist httpFault: delay: fixedDelay: 7.000s percent: 100
  • 46. ISTIO METRICS AND MONITORING § Verify Traffic Splits § Fine-Grained Request Tracing
  • 47. ISTIO SECURITY § Istio Certificate Authority § Mutual TLS
  • 48. AGENDA Part 0: Latest PipelineAI Research Part 1: PipelineAI + Kubernetes + Istio
  • 49. THANK YOU!! § https://github.com/PipelineAI/pipeline/ § Please Star 🌟 this GitHub Repo! § Reminder: VC’s Value GitHub Stars @ $1,500 Each (!!) Contact Me chris@pipeline.ai @cfregly