SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Scaling Spark
on Kubernetes
Li Gao
Introduction
2#UnifiedAnalytics #SparkAISummit
Li Gao
Works in the Data Platform team at Lyft, currently leading
multiple Data Infra initiatives within Data Platform, including
the Spark on Kubernetes project.
Previously held tech leadership roles at Salesforce, Fitbit,
Groupon, and other startups.
Agenda
3#UnifiedAnalytics #SparkAISummit
● Introduction of Data Landscape at Lyft
● The challenges we face
● How Apache Spark on Kubernetes can help
● Remaining work
Data Landscape
4#UnifiedAnalytics #SparkAISummit
● Batch data Ingestion and ETL
● Data Streaming
● ML platforms
● Notebooks and BI tools
● Query and Visualization
● Operational Analytics
● Data Discovery & Lineage
● ML workflow orchestration
● Cloud Platforms
Business
Analysts,
Data
Scientists,
AML
Cloud Infra
Services
Data Infrastructure @ Lyft
5
External
and
Internal
Products
and
Services
Data Portal
(Discovery,
WF, SLx
dashboard
etc.
Data Infra
Batch
Compute
Clusters
Batch Compute @ Lyft
6
Events
Ext Data
RDB/KV
Sys Events
IngestPipelines
AWSS3
AWSS3
HMS
Presto,HiveClient,andBITools
Analysts
Engineers
Scientists
Services
Evolving Batch Architecture
7
Future2016-2017
Vendor-based
Hadoop
2017-2018
Hive on MR
Vendor Presto
Mid 2018
Hive on Tez +
Spark Adhoc
Late
2018
Spark on
Vendor GA
Early-Mid
2019
Spark on K8s
Alpha
Spark on K8s
Beta & Preprod
Initial Batch Architecture
8
Batch Compute Challenges
9
● 3rd Party vendor dependency limitations
● Data ETL expressed solely in SQL, and sometimes in
hard-maintain complex SQL
● Complex logic expressed in Python that hard to adopt
in SQL
● Different dependencies and versions
● Resource load balancing for heterogeneous workloads
Is SQL the complete solution?
10
How Spark can help?
11
RDB/KV
Application
s APIs
Environments
Data Sources
and Data
Sinks
What challenges remain?
12
● Per job custom dependencies and security context
isolation
● Multi version runtime requirements (Py3 v.s. Py2, Spark
versions)
● Still need to run on shared clusters for cost efficiency
● Mixed ML and ETL workloads
How Kubernetes can help?
13
Operators &
Controllers
Pods Ingress Services
Namespaces
Pods
Immutability
Event driven &
Declarative
Vibrant CNCF Community
ServiceMesh
Multi-TenancySupport
Image
Registry
CNCF Landscape
14
What challenges still remain?
● Spark on k8s is still in its early days
● Single cluster scaling limit
● CRD operator choking limit
● Cluster control plane rollout pain points
● Pod churn and IP allocations throttling
● Default k8s scheduler limit
● ECR container registry reliability
15
Current scale
16
● 10s PB data lake
● (O) 100k batch jobs running daily
● ~ 1000s of EC2 nodes spanning multiple
clusters and AZs
● ~ 1000s of workflows running daily
How Lyft scales Spark on K8s
# of Clusters # of Namespaces
# of Pods
Pod Churn Rate
# of Nodes
Pod Size
Job:Pod ratio
IP Alloc Rate Limit
ECR Rate Limit
Affinity &
Isolation
QoS & Quota
Pod Scheduler
The Evolving Architecture
18
A Start of a Spark Job @ k8s
19
Resource
Labels
Job
Cluster
Pool
Cluster
Namespace
Group
Namespace
Spark
CRD
K8s Pods
● (1) and (2) Dispatcher Gateway
● (3) Cluster Controller
● (4) Job Scheduler
● (5) Namespace Group Controller
● (6) Spark Operator
● (7) K8s Pod Scheduler
(1)
(3)
(4)
(5)
(6)
(7)
(2)
Multiple Clusters
20
HA in Cluster Pool
21
Cluster 1
Cluster 2
Cluster 3
Cluster Pool A
Cluster 4
● Cluster rotation within a cluster pool
● Automated provisioning of a new cluster and (manually) add into rotation
● Throttle at lower bound when rotation in progress
Multiple Namespaces (Groups)
22
Pod Pod Pod
Namespace 1
Pod Pod Pod
Namespace 2
Pod Pod Pod
Namespace 3
Node A Node B Node C Node D
Role1 Role1 Role2
Max Pod Size 1 Max Pod Size 2
● Practical ~3K active pods per namespace observed
● Less preemption required when namespace isolated by quota
● Different namespaces can map different IAM roles and sidecar
configurations for security and auditing purposes
Pod Sharing
23
Job
Controller Spark Driver
Pod
Spark Exec
Pods
Job 2 Driver
Pod
Job 2 Exec
Pods
Job 3 Driver
Pod
Job 3 Exec
Pods
Shared Pods
Job 1
Job 4
Job 3
Job 2
AWS
S3
Dep
Dep
Dedicate & Isolated Pods
Dep
Separate DML from DDL
24
DDL Separation to reduce churn
25
Pod Priority and Preemption (WIP)
26
● Priority base
preemption
● Driver pod has higher
priority than executor
pod
● Experimental
D1 D2 E1 E2 E3 E4
K8s Scheduler
D1
E5
New Pod Req
Before
D2 E5 E2 E3 E4
After
E1
Evictedhttps://github.com/kubernetes/kubernetes/issues/71486
https://github.com/kubernetes/enhancements/issues/564
Taints and Tolerations (WIP)
27
Node A Node B Node C Node D Node E Node F
P1 P2 P3 P4 P5 P6 P7 P7 P8 P9 P10
Controllers and Watchers Job 1 Job 2
Core Nodes (Taint) Worker Nodes (Taint)
● Other considerations: Node Labels, Node Selectors to separate GPU and CPU based
workloads
Mutating Admission Hooks
28
K8S API HTTP
Handler
Authn & Authz
Mutating admin
controllers
Schema
validation
validating admin
controllers
ETCD
k8s pod
scheduler
kubelet
Node
Spark Pod
Mutating admin
webhooks
validating admin
webhooks
Pod Request
kubelet
Node
Spark Pod
sidecars
config
credit: https://banzaicloud.com/blog/k8s-admission-webhooks/
Custom k8s Pod scheduler for batch (WIP)
Predicates
Priorities
Round
Robin
Predicates
Weight
Engine
Placement
Engine
Policies
Default k8s scheduler Dynamic Policy Driven k8s scheduler
All Active Notes
All Active Notes
What about ECR reliability?
30
Node 1 Node 2 Node 3
Pods Pods Pods
DaemonSet + Docker In Docker
ECR Container Images
Spark Job Config Overlays (DML)
31
Cluster Pool Defaults
Cluster Defaults
Spark Job User Specified Config
Cluster and Namespace Overrides
Final Spark Job Config
Config
Composer
&
Event
Watcher
Spark
Operator
X-Ray of Job Controller
32
Controllers & Watchers
• Job scheduler
• Spark job config composer
• Namespace group controller
• k8s pod scheduler
• Service controllers (STS, Jupyter)
• K8s metrics & events watchers
• Spark job/crd events & metrics watchers
33
X-Ray of Spark Operator
34
Monitoring and Logging Toolbox
35
JMX
Provision & Automation
36
Kustomize Template
K8S Deploy
Sidecar injectors
Secrets injectors
DaemonSets
KIAM
Remaining work
● More intelligent & efficient job routing, scheduler and
parameter composer
● End-to-End serverless, self-serviceable, and user-
oriented data compute mesh
● Fine grained cost attribution
● Improved docker image distribution
● Spark 3.0 & Kubernetes v1.14+
37
Key Takeaways
● Apache Spark can help unify different batch data compute
use cases
● Kubernetes can help solve the dependency and multi-version
requirements using its containerized approach
● Spark on Kubernetes can scale significantly by using a multi-
cluster compute mesh approach with proper resource
isolation and scheduling techniques
● Challenges remain when running Spark on Kubernetes at
scale
38
Community
39
This effort would not be possible
without the help from the open
source and wider communities:
Q&A
40
Li Gao in/ligao101
41
42
Monitoring Example - OOM Kill
43
What about dependencies?
44
RTree Libraries
Data CodecsSpatial Libraries
3rd Party Vendor Limitations
45
● Proprietary patches
● Inconsistent bootstrap
● Release schedule
● Homogeneous environments
What about Python functions?
46
“I want to express my processing logic in python functions with
external geo libraries (i.e. Geomesa) and interact with Hive
tables” --- Lyft data engineer

Weitere ähnliche Inhalte

Was ist angesagt?

Continuous Processing in Structured Streaming with Jose Torres
 Continuous Processing in Structured Streaming with Jose Torres Continuous Processing in Structured Streaming with Jose Torres
Continuous Processing in Structured Streaming with Jose Torres
Databricks
 

Was ist angesagt? (20)

Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
 
Continuous Processing in Structured Streaming with Jose Torres
 Continuous Processing in Structured Streaming with Jose Torres Continuous Processing in Structured Streaming with Jose Torres
Continuous Processing in Structured Streaming with Jose Torres
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
 
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
 
Building Value Within the Heavy Vehicle Industry Using Big Data and Streaming...
Building Value Within the Heavy Vehicle Industry Using Big Data and Streaming...Building Value Within the Heavy Vehicle Industry Using Big Data and Streaming...
Building Value Within the Heavy Vehicle Industry Using Big Data and Streaming...
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
 
Sqoop on Spark for Data Ingestion-(Veena Basavaraj and Vinoth Chandar, Uber)
Sqoop on Spark for Data Ingestion-(Veena Basavaraj and Vinoth Chandar, Uber)Sqoop on Spark for Data Ingestion-(Veena Basavaraj and Vinoth Chandar, Uber)
Sqoop on Spark for Data Ingestion-(Veena Basavaraj and Vinoth Chandar, Uber)
 
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
 
Speed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS AcceleratorSpeed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS Accelerator
 
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Standalone Spark Deployment for Stability and Performance
Standalone Spark Deployment for Stability and PerformanceStandalone Spark Deployment for Stability and Performance
Standalone Spark Deployment for Stability and Performance
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applications
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
 
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
 
Scaling Apache Spark on Kubernetes at Lyft
Scaling Apache Spark on Kubernetes at LyftScaling Apache Spark on Kubernetes at Lyft
Scaling Apache Spark on Kubernetes at Lyft
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
 
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
 
#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on Mesos#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on Mesos
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
 

Ähnlich wie SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft

Reliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesReliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on Kubernetes
Databricks
 

Ähnlich wie SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft (20)

Scaling spark on kubernetes at Lyft
Scaling spark on kubernetes at LyftScaling spark on kubernetes at Lyft
Scaling spark on kubernetes at Lyft
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
 
Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Bandwidth: Use Cases for Elastic Cloud on Kubernetes Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Bandwidth: Use Cases for Elastic Cloud on Kubernetes
 
Reliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesReliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on Kubernetes
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark Downscaling
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Scalable Clusters On Demand
Scalable Clusters On DemandScalable Clusters On Demand
Scalable Clusters On Demand
 
Spark on Yarn @ Netflix
Spark on Yarn @ NetflixSpark on Yarn @ Netflix
Spark on Yarn @ Netflix
 
Producing Spark on YARN for ETL
Producing Spark on YARN for ETLProducing Spark on YARN for ETL
Producing Spark on YARN for ETL
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using Kubernetes
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsDevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
 
Microservices at ibotta pitfalls and learnings
Microservices at ibotta pitfalls and learningsMicroservices at ibotta pitfalls and learnings
Microservices at ibotta pitfalls and learnings
 
Apache spark 2.4 and beyond
Apache spark 2.4 and beyondApache spark 2.4 and beyond
Apache spark 2.4 and beyond
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
 
Kubernetes Monitoring & Best Practices
Kubernetes Monitoring & Best PracticesKubernetes Monitoring & Best Practices
Kubernetes Monitoring & Best Practices
 
Seattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APISeattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp API
 

Mehr von Chester Chen

zookeeer+raft-2.pdf
zookeeer+raft-2.pdfzookeeer+raft-2.pdf
zookeeer+raft-2.pdf
Chester Chen
 
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
Chester Chen
 
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
Chester Chen
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
Chester Chen
 
Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bighead
Chester Chen
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
Chester Chen
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreath
Chester Chen
 

Mehr von Chester Chen (20)

SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfSFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdf
 
zookeeer+raft-2.pdf
zookeeer+raft-2.pdfzookeeer+raft-2.pdf
zookeeer+raft-2.pdf
 
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
 
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
 
A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?
 
Shopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdataShopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdata
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK... SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
 
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scaleSF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
SFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdapSFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdap
 
Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bighead
 
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformSf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
 
2018 data warehouse features in spark
2018   data warehouse features in spark2018   data warehouse features in spark
2018 data warehouse features in spark
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_index
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreath
 
Index conf sparkai-feb20-n-pentreath
Index conf sparkai-feb20-n-pentreathIndex conf sparkai-feb20-n-pentreath
Index conf sparkai-feb20-n-pentreath
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft

  • 2. Introduction 2#UnifiedAnalytics #SparkAISummit Li Gao Works in the Data Platform team at Lyft, currently leading multiple Data Infra initiatives within Data Platform, including the Spark on Kubernetes project. Previously held tech leadership roles at Salesforce, Fitbit, Groupon, and other startups.
  • 3. Agenda 3#UnifiedAnalytics #SparkAISummit ● Introduction of Data Landscape at Lyft ● The challenges we face ● How Apache Spark on Kubernetes can help ● Remaining work
  • 4. Data Landscape 4#UnifiedAnalytics #SparkAISummit ● Batch data Ingestion and ETL ● Data Streaming ● ML platforms ● Notebooks and BI tools ● Query and Visualization ● Operational Analytics ● Data Discovery & Lineage ● ML workflow orchestration ● Cloud Platforms
  • 5. Business Analysts, Data Scientists, AML Cloud Infra Services Data Infrastructure @ Lyft 5 External and Internal Products and Services Data Portal (Discovery, WF, SLx dashboard etc. Data Infra
  • 6. Batch Compute Clusters Batch Compute @ Lyft 6 Events Ext Data RDB/KV Sys Events IngestPipelines AWSS3 AWSS3 HMS Presto,HiveClient,andBITools Analysts Engineers Scientists Services
  • 7. Evolving Batch Architecture 7 Future2016-2017 Vendor-based Hadoop 2017-2018 Hive on MR Vendor Presto Mid 2018 Hive on Tez + Spark Adhoc Late 2018 Spark on Vendor GA Early-Mid 2019 Spark on K8s Alpha Spark on K8s Beta & Preprod
  • 9. Batch Compute Challenges 9 ● 3rd Party vendor dependency limitations ● Data ETL expressed solely in SQL, and sometimes in hard-maintain complex SQL ● Complex logic expressed in Python that hard to adopt in SQL ● Different dependencies and versions ● Resource load balancing for heterogeneous workloads
  • 10. Is SQL the complete solution? 10
  • 11. How Spark can help? 11 RDB/KV Application s APIs Environments Data Sources and Data Sinks
  • 12. What challenges remain? 12 ● Per job custom dependencies and security context isolation ● Multi version runtime requirements (Py3 v.s. Py2, Spark versions) ● Still need to run on shared clusters for cost efficiency ● Mixed ML and ETL workloads
  • 13. How Kubernetes can help? 13 Operators & Controllers Pods Ingress Services Namespaces Pods Immutability Event driven & Declarative Vibrant CNCF Community ServiceMesh Multi-TenancySupport Image Registry
  • 15. What challenges still remain? ● Spark on k8s is still in its early days ● Single cluster scaling limit ● CRD operator choking limit ● Cluster control plane rollout pain points ● Pod churn and IP allocations throttling ● Default k8s scheduler limit ● ECR container registry reliability 15
  • 16. Current scale 16 ● 10s PB data lake ● (O) 100k batch jobs running daily ● ~ 1000s of EC2 nodes spanning multiple clusters and AZs ● ~ 1000s of workflows running daily
  • 17. How Lyft scales Spark on K8s # of Clusters # of Namespaces # of Pods Pod Churn Rate # of Nodes Pod Size Job:Pod ratio IP Alloc Rate Limit ECR Rate Limit Affinity & Isolation QoS & Quota Pod Scheduler
  • 19. A Start of a Spark Job @ k8s 19 Resource Labels Job Cluster Pool Cluster Namespace Group Namespace Spark CRD K8s Pods ● (1) and (2) Dispatcher Gateway ● (3) Cluster Controller ● (4) Job Scheduler ● (5) Namespace Group Controller ● (6) Spark Operator ● (7) K8s Pod Scheduler (1) (3) (4) (5) (6) (7) (2)
  • 21. HA in Cluster Pool 21 Cluster 1 Cluster 2 Cluster 3 Cluster Pool A Cluster 4 ● Cluster rotation within a cluster pool ● Automated provisioning of a new cluster and (manually) add into rotation ● Throttle at lower bound when rotation in progress
  • 22. Multiple Namespaces (Groups) 22 Pod Pod Pod Namespace 1 Pod Pod Pod Namespace 2 Pod Pod Pod Namespace 3 Node A Node B Node C Node D Role1 Role1 Role2 Max Pod Size 1 Max Pod Size 2 ● Practical ~3K active pods per namespace observed ● Less preemption required when namespace isolated by quota ● Different namespaces can map different IAM roles and sidecar configurations for security and auditing purposes
  • 23. Pod Sharing 23 Job Controller Spark Driver Pod Spark Exec Pods Job 2 Driver Pod Job 2 Exec Pods Job 3 Driver Pod Job 3 Exec Pods Shared Pods Job 1 Job 4 Job 3 Job 2 AWS S3 Dep Dep Dedicate & Isolated Pods Dep
  • 25. DDL Separation to reduce churn 25
  • 26. Pod Priority and Preemption (WIP) 26 ● Priority base preemption ● Driver pod has higher priority than executor pod ● Experimental D1 D2 E1 E2 E3 E4 K8s Scheduler D1 E5 New Pod Req Before D2 E5 E2 E3 E4 After E1 Evictedhttps://github.com/kubernetes/kubernetes/issues/71486 https://github.com/kubernetes/enhancements/issues/564
  • 27. Taints and Tolerations (WIP) 27 Node A Node B Node C Node D Node E Node F P1 P2 P3 P4 P5 P6 P7 P7 P8 P9 P10 Controllers and Watchers Job 1 Job 2 Core Nodes (Taint) Worker Nodes (Taint) ● Other considerations: Node Labels, Node Selectors to separate GPU and CPU based workloads
  • 28. Mutating Admission Hooks 28 K8S API HTTP Handler Authn & Authz Mutating admin controllers Schema validation validating admin controllers ETCD k8s pod scheduler kubelet Node Spark Pod Mutating admin webhooks validating admin webhooks Pod Request kubelet Node Spark Pod sidecars config credit: https://banzaicloud.com/blog/k8s-admission-webhooks/
  • 29. Custom k8s Pod scheduler for batch (WIP) Predicates Priorities Round Robin Predicates Weight Engine Placement Engine Policies Default k8s scheduler Dynamic Policy Driven k8s scheduler All Active Notes All Active Notes
  • 30. What about ECR reliability? 30 Node 1 Node 2 Node 3 Pods Pods Pods DaemonSet + Docker In Docker ECR Container Images
  • 31. Spark Job Config Overlays (DML) 31 Cluster Pool Defaults Cluster Defaults Spark Job User Specified Config Cluster and Namespace Overrides Final Spark Job Config Config Composer & Event Watcher Spark Operator
  • 32. X-Ray of Job Controller 32
  • 33. Controllers & Watchers • Job scheduler • Spark job config composer • Namespace group controller • k8s pod scheduler • Service controllers (STS, Jupyter) • K8s metrics & events watchers • Spark job/crd events & metrics watchers 33
  • 34. X-Ray of Spark Operator 34
  • 35. Monitoring and Logging Toolbox 35 JMX
  • 36. Provision & Automation 36 Kustomize Template K8S Deploy Sidecar injectors Secrets injectors DaemonSets KIAM
  • 37. Remaining work ● More intelligent & efficient job routing, scheduler and parameter composer ● End-to-End serverless, self-serviceable, and user- oriented data compute mesh ● Fine grained cost attribution ● Improved docker image distribution ● Spark 3.0 & Kubernetes v1.14+ 37
  • 38. Key Takeaways ● Apache Spark can help unify different batch data compute use cases ● Kubernetes can help solve the dependency and multi-version requirements using its containerized approach ● Spark on Kubernetes can scale significantly by using a multi- cluster compute mesh approach with proper resource isolation and scheduling techniques ● Challenges remain when running Spark on Kubernetes at scale 38
  • 39. Community 39 This effort would not be possible without the help from the open source and wider communities:
  • 41. 41
  • 42. 42
  • 43. Monitoring Example - OOM Kill 43
  • 44. What about dependencies? 44 RTree Libraries Data CodecsSpatial Libraries
  • 45. 3rd Party Vendor Limitations 45 ● Proprietary patches ● Inconsistent bootstrap ● Release schedule ● Homogeneous environments
  • 46. What about Python functions? 46 “I want to express my processing logic in python functions with external geo libraries (i.e. Geomesa) and interact with Hive tables” --- Lyft data engineer

Hinweis der Redaktion

  1. Different users and usecases - ml, streaming, realtime , batch, notebooks multiple cloud platforms
  2. declarative predictable & repeatable operators add extensibility multi tenancy container nati ve
  3. CNCF is a vibrant community and supports numerous projects
  4. patch rollout/updates for crd/control plane is still evolving pod churn - etcd/resource/ttl/ip allocation in ec2 for eg
  5. What is homogeneous envs here?