SlideShare ist ein Scribd-Unternehmen logo
1 von 47
1© Cloudera, Inc. All rights reserved.
Effective Spark on Multi-Tenant
Clusters
Kostas Sakellis
2© Cloudera, Inc. All rights reserved.
Me
• Spark Tech Lead Manager at Cloudera
• Contributed to Apache Spark
• Previously, stint on Cloudera Manager
3© Cloudera, Inc. All rights reserved.
Challenges
• Predictable execution time of Spark jobs
• Prevent Starvation
• Optimal cluster utilization
• Secure Data access
• Configuration Management
4© Cloudera, Inc. All rights reserved.
Spark on YARN
5© Cloudera, Inc. All rights reserved.
Why YARN?
• Spark supports pluggable Cluster Managers
• local, Standalone, YARN and Mesos
• YARN contains proper resource manager
• Enables multi-platform jobs
• Spark on YARN is mature with active community
6© Cloudera, Inc. All rights reserved.
Running an application
spark-submit --master yarn-cluster
--executor-memory 2g
--num-executors 3
--num-cores 2
<your-class>
7© Cloudera, Inc. All rights reserved.
Host-b.mydomain.com
System Architecture
host-a.mydomain.com
Resource Manager
Node Manager
Host-c.mydomain.com
Node Manager
Node Manager
Container
App Master
Exec2
Exec1
Exec3
Driver
Driver
Exec1 Exec2
8© Cloudera, Inc. All rights reserved.
Gotchas
• Ensure compatible YARN configuration
• yarn.nodemanager.resource.[memory-mb|cpu-vcores]
• yarn.scheduler.maximum-allocation-[vcores|mb]
• ...
• Remember overhead memory
• spark.yarn.executor.memoryOverhead
• Default of 10% since Spark 1.4
9© Cloudera, Inc. All rights reserved.
Container
[pid=63375,containerID=container_1388158490598_0001_01_00
0003] is running beyond physical memory limits. Current
usage: 2.1 GB of 2 GB physical memory used; 2.8 GB of 4.2
GB virtual memory used. Killing container.
[...]
Otherwise…
10© Cloudera, Inc. All rights reserved.
Container
[pid=63375,containerID=container_1388158490598_0001_01_00
0003] is running beyond physical memory limits. Current
usage: 2.1 GB of 2 GB physical memory used; 2.8 GB of 4.2
GB virtual memory used. Killing container.
[...]
Otherwise…
11© Cloudera, Inc. All rights reserved.
Host-b.mydomain.com
System Architecture
host-a.mydomain.com
Resource Manager
Node Manager
Host-c.mydomain.com
Node Manager
Node Manager
Exec2
Exec1
Exec3
Driver
Driver
Exec1 Exec2
Exec3
Exec2
Exec1
Driver
12© Cloudera, Inc. All rights reserved.
How do we share
a common
resource?
Courtesy of: https://radioglobalistic.files.wordpress.com/2011/02/lagos-traffic.jpg
13© Cloudera, Inc. All rights reserved.
Resource Management
• YARN has ability to create resource queues
• Priorities can be set per queues
• Preemption is also available
• Fixed in Spark 1.6 (SPARK-8167)
• yarn.scheduler.fair.preemption
14© Cloudera, Inc. All rights reserved.
Running an application
spark-submit --master yarn-cluster
--queue my-special-queue
--executor-memory 2g
--num-executors 3
--num-cores 2
<your-class>
15© Cloudera, Inc. All rights reserved.
How about
locality?
Courtesy of: https://radioglobalistic.files.wordpress.com/2011/02/lagos-traffic.jpgCourtesy of: https://blog.voxbone.com/wp-content/uploads/2015/07/think-global-act-local.jpg
16© Cloudera, Inc. All rights reserved.
ExecutorExecutor
Task Scheduling
Driver Executor
DAG Scheduler
Task Scheduler
Core
TaskTask
Shuffle
Shuffle
stagestageStage
Spark Context JobJobJob
17© Cloudera, Inc. All rights reserved.
Host-b.mydomain.com
Locality
host-a.mydomain.com
Resource Manager
Node Manager
HDFS
x:B1 x:B2 y:B1 y:B3
Host-c.mydomain.com
Node Manager
Node Manager
HDFS
x:B3 x:B2 y:B2 y:B3
HDFS
x:B3 x:B1 y:B1 y:B2
hdfs://x
hdfs://y
Exec2
Exec1Driver
18© Cloudera, Inc. All rights reserved.
Spark creates executors before
executing code!
19© Cloudera, Inc. All rights reserved.
Underutilized
Clusters
Courtesy of: http://media.nbclosangeles.com/images/1200*675/60-freeway-repair-dec16-2-empty.JPG
20© Cloudera, Inc. All rights reserved.
Dynamic Allocation
• Spark applications scale the number of executors based on load
• Removes need for: --num-executors
• Idle executors get killed
• First supported in CDH 5.4
• Ideal for:
• Long ETL jobs with large shuffles
• shell applications: hive and spark shell
21© Cloudera, Inc. All rights reserved.
Task Scheduling
Driver
DAG Scheduler
Task Scheduler
stagestageStage
Spark Context JobJobJob
host-a.mydomain.com
Node Manager
Exec1
host-b.mydomain.com
Node Manager
Exec2
host-c.mydomain.com
Node Manager
Task
Task
Exec3
Task
Task
RM
22© Cloudera, Inc. All rights reserved.
Dynamic Allocation Configuration
• Many Knobs
• spark.dynamicAllocation.enabled
• spark.dynamicAllocation.[min|max|initial]Executors
• spark.dynamicAllocation.executorIdleTimeout
• spark.dynamicAllocation.cachedExecutorIdleTimeout
• ...
• --num-executors will disable dynamic allocation
23© Cloudera, Inc. All rights reserved.
Dynamic Allocation Limitations
• Still required to specify cores
• --num-cores
• Memory
• --executor-memory
• Includes JVM overhead
• Caching
• spark.dynamicAllocation.cachedExecutorIdleTimeout
24© Cloudera, Inc. All rights reserved.
The Future of Dynamic Allocation
• Only “task size” needed: --task-size
• Eliminates
• --num-cores
• --num-executors
• --executor-memory
• Leads to better cluster utilization
25© Cloudera, Inc. All rights reserved.
Dynamic Allocation respects
Locality!
26© Cloudera, Inc. All rights reserved.
Security, oh no!
Courtesy of: https://www.iti.illinois.edu/sites/default/files/Cybersecurity_image.jpg
27© Cloudera, Inc. All rights reserved.
Security
• Shared resources -> Shared data
• Security has many facets
• Encryption
• Authentication
• Authorization
• Encryption is interesting for multi-tenant clusters
28© Cloudera, Inc. All rights reserved.
Encryption
Who’s looking at the data?
29© Cloudera, Inc. All rights reserved.
Data Flow in Spark
Driver
Executor
Executor
Spark
Submit
Control Plane
File Distribution
Shuffle Blocks
UI
Disk
Disk
Spilled/Shuffle Blocks
30© Cloudera, Inc. All rights reserved.
Prior to Spark 1.6
• Different channel, different method
• Control plane
• File distribution
• Shuffle Blocks
• User UI / REST API
• Spilled/Shuffle Blocks
SSL
SSL
SASL Encryption
No Encryption
Use encrypfs (or equivalent)
31© Cloudera, Inc. All rights reserved.
What is wrong with SSL?
32© Cloudera, Inc. All rights reserved.
Why not SSL?
• SSL can be hard to set up
• Need certificates readable on every node
• Sharing certificates not as secure
• Hard to have per-user certificate
33© Cloudera, Inc. All rights reserved.
Spark 1.6
• Standardize around a common transport library
• Replaces Akka RPC (SPARK-6028)
• Replaces HTTP File service (SPARK-11140)
• Uses Netty transport library with SASL Encryption
• But..
• WebUI still has no encryption
• Shuffle / Spilled blocks still require FS-level encryption
• SASL in JVM restricted to 3DES – not very strong and slow
34© Cloudera, Inc. All rights reserved.
Spark 2.0
• REPL class distribution using transport lib (SPARK-11563)
• HTTPS Support for WebUI (SPARK-2750)
• Encrypting spilled blocks is almost available (SPARK-5682)
• Depends on third party Chimera library for encryption
• Work is being done to add Chimera to Apache Commons
• Future:
• Use Chimera to encrypt over-the-wire data
35© Cloudera, Inc. All rights reserved.
Gateways:
launching Spark
Application
Courtesy of:
36© Cloudera, Inc. All rights reserved.
Host-b.mydomain.com
Spark Gateway
Resource Manager
Host-c.mydomain.com
Node Manager
Node Manager
gateway-a.mydomain.com
Bob Client
Client
Configs
Spark
Install
Random
Ports
Driver
Exec1 Exec2
Exec1 Driver
SSH
37© Cloudera, Inc. All rights reserved.
Gateway Considerations
• Gateway hosts actively managed by administrators
• Updates to client configurations and Spark installs
• Users need to tunnel into network
• Difficult to put users behind firewall
• YARN allows different Spark versions
• spark.yarn.jar or spark.yarn.archive
• Shared Spark services makes this difficult
38© Cloudera, Inc. All rights reserved.
Host-b.mydomain.com
Shared Services
Resource Manager
Host-c.mydomain.com
Node Manager
Node Manager
gateway-a.mydomain.com
Bob Client
Client
Configs
Spark
Install
Random
Ports
Driver
Exec1 Exec2
Exec1 Driver
SSH
S
S
S
S
History
Service
39© Cloudera, Inc. All rights reserved.
Alternative
An open source Apache licensed REST web service that manages
long running Spark contexts in your cluster
40© Cloudera, Inc. All rights reserved.
Livy Architecture
Rest
Server
Cluster Manager
Driver ExecutorExecutor
Client
Driver ExecutorExecutor
The Managed ClusterHTTP
Context 1
Context 2
Context 2
Context 1
41© Cloudera, Inc. All rights reserved.
Case 1: Spark Application JAR Submission
• Enables spark applications to be submitted without needing a
Spark installation
• Basically a wrapper around spark-submit
% curl –XPOST localhost:8998/batches -d
'{
"file": "<path_to_file>",
“className”: “com.foo.bar..”
...
}'
42© Cloudera, Inc. All rights reserved.
How do you retrieve results?
43© Cloudera, Inc. All rights reserved.
Case 2: Fine grained Job submission
• Programmatic submission of Spark jobs to a long running
application
• A thin Java (and Scala) client available for easier integration
• Provides automatic serialization/deserialization
• Enables Web/Mobile applications to use Spark as a backend
44© Cloudera, Inc. All rights reserved.
Case 2: Example
// Create Livy Client
LivyClient client = new LivyClientBuilder(false)
.setURI(new URI(”<uri>"))
.setAll(<config>)
.build()
// JobHandle allows monitoring of jobs
JobHandle<Long> handle = client.submit(new YourJob());
// Block until results are returned
handle.get(TIMEOUT, TimeUnit.SECONDS)
// Close connections
client.stop()
45© Cloudera, Inc. All rights reserved.
Case 2: Example
private static class YourJob implements Job<Long> {
@Override
public Long call(JobContext jc) {
ArrayList<Long> list = Arrays.asList(1, 2, 3, 4, 5);
JavaRDD<Integer> rdd = jc.sc().parallelize(list);
return rdd.count();
}
}
// Job Interface to Implement
public interface Job<T> extends Serializable {
T call(JobContext jc) throws Exception;
}
46© Cloudera, Inc. All rights reserved.
Contributions Welcome!
• http://livy.io/
• Code: https://github.com/cloudera/livy
• JIRA: https://issues.cloudera.org/browse/LIVY
• Users: http://groups.google.com/a/cloudera.org/group/livy-user
• Dev: http://groups.google.com/a/cloudera.org/group/livy-dev
47© Cloudera, Inc. All rights reserved.
Thank you

Weitere ähnliche Inhalte

Was ist angesagt?

SQL on everything, in memory
SQL on everything, in memorySQL on everything, in memory
SQL on everything, in memoryJulian Hyde
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System OverviewFlink Forward
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안SANG WON PARK
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connectKnoldus Inc.
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Julian Hyde
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayDataWorks Summit
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High PerformanceInderaj (Raj) Bains
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 

Was ist angesagt? (20)

SQL on everything, in memory
SQL on everything, in memorySQL on everything, in memory
SQL on everything, in memory
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connect
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Apache flink
Apache flinkApache flink
Apache flink
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 

Ähnlich wie Effective Spark on Multi-Tenant Clusters

Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark OperationsCloudera, Inc.
 
Building Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache SparkBuilding Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache SparkJeremy Beard
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduJeremy Beard
 
Getting Apache Spark Customers to Production
Getting Apache Spark Customers to ProductionGetting Apache Spark Customers to Production
Getting Apache Spark Customers to ProductionCloudera, Inc.
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionCloudera, Inc.
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr PerformanceLucidworks
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Cloudera, Inc.
 
Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2DataWorks Summit
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaGrant Henke
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARNDataWorks Summit
 
Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in production
Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in productionBreaking Spark: Top 5 mistakes to avoid when using Apache Spark in production
Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in productionNeelesh Srinivas Salian
 
Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseData Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseCloudera, Inc.
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkitthelabdude
 
OpenStack for devops environment
OpenStack for devops environment OpenStack for devops environment
OpenStack for devops environment Orgad Kimchi
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaDataWorks Summit
 

Ähnlich wie Effective Spark on Multi-Tenant Clusters (20)

Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark Operations
 
Building Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache SparkBuilding Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache Spark
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
 
Getting Apache Spark Customers to Production
Getting Apache Spark Customers to ProductionGetting Apache Spark Customers to Production
Getting Apache Spark Customers to Production
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in production
Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in productionBreaking Spark: Top 5 mistakes to avoid when using Apache Spark in production
Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in production
 
Chicago spark meetup-april2017-public
Chicago spark meetup-april2017-publicChicago spark meetup-april2017-public
Chicago spark meetup-april2017-public
 
Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseData Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the Enterprise
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
 
OpenStack for devops environment
OpenStack for devops environment OpenStack for devops environment
OpenStack for devops environment
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 

Mehr von DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

Mehr von DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Kürzlich hochgeladen

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Effective Spark on Multi-Tenant Clusters

  • 1. 1© Cloudera, Inc. All rights reserved. Effective Spark on Multi-Tenant Clusters Kostas Sakellis
  • 2. 2© Cloudera, Inc. All rights reserved. Me • Spark Tech Lead Manager at Cloudera • Contributed to Apache Spark • Previously, stint on Cloudera Manager
  • 3. 3© Cloudera, Inc. All rights reserved. Challenges • Predictable execution time of Spark jobs • Prevent Starvation • Optimal cluster utilization • Secure Data access • Configuration Management
  • 4. 4© Cloudera, Inc. All rights reserved. Spark on YARN
  • 5. 5© Cloudera, Inc. All rights reserved. Why YARN? • Spark supports pluggable Cluster Managers • local, Standalone, YARN and Mesos • YARN contains proper resource manager • Enables multi-platform jobs • Spark on YARN is mature with active community
  • 6. 6© Cloudera, Inc. All rights reserved. Running an application spark-submit --master yarn-cluster --executor-memory 2g --num-executors 3 --num-cores 2 <your-class>
  • 7. 7© Cloudera, Inc. All rights reserved. Host-b.mydomain.com System Architecture host-a.mydomain.com Resource Manager Node Manager Host-c.mydomain.com Node Manager Node Manager Container App Master Exec2 Exec1 Exec3 Driver Driver Exec1 Exec2
  • 8. 8© Cloudera, Inc. All rights reserved. Gotchas • Ensure compatible YARN configuration • yarn.nodemanager.resource.[memory-mb|cpu-vcores] • yarn.scheduler.maximum-allocation-[vcores|mb] • ... • Remember overhead memory • spark.yarn.executor.memoryOverhead • Default of 10% since Spark 1.4
  • 9. 9© Cloudera, Inc. All rights reserved. Container [pid=63375,containerID=container_1388158490598_0001_01_00 0003] is running beyond physical memory limits. Current usage: 2.1 GB of 2 GB physical memory used; 2.8 GB of 4.2 GB virtual memory used. Killing container. [...] Otherwise…
  • 10. 10© Cloudera, Inc. All rights reserved. Container [pid=63375,containerID=container_1388158490598_0001_01_00 0003] is running beyond physical memory limits. Current usage: 2.1 GB of 2 GB physical memory used; 2.8 GB of 4.2 GB virtual memory used. Killing container. [...] Otherwise…
  • 11. 11© Cloudera, Inc. All rights reserved. Host-b.mydomain.com System Architecture host-a.mydomain.com Resource Manager Node Manager Host-c.mydomain.com Node Manager Node Manager Exec2 Exec1 Exec3 Driver Driver Exec1 Exec2 Exec3 Exec2 Exec1 Driver
  • 12. 12© Cloudera, Inc. All rights reserved. How do we share a common resource? Courtesy of: https://radioglobalistic.files.wordpress.com/2011/02/lagos-traffic.jpg
  • 13. 13© Cloudera, Inc. All rights reserved. Resource Management • YARN has ability to create resource queues • Priorities can be set per queues • Preemption is also available • Fixed in Spark 1.6 (SPARK-8167) • yarn.scheduler.fair.preemption
  • 14. 14© Cloudera, Inc. All rights reserved. Running an application spark-submit --master yarn-cluster --queue my-special-queue --executor-memory 2g --num-executors 3 --num-cores 2 <your-class>
  • 15. 15© Cloudera, Inc. All rights reserved. How about locality? Courtesy of: https://radioglobalistic.files.wordpress.com/2011/02/lagos-traffic.jpgCourtesy of: https://blog.voxbone.com/wp-content/uploads/2015/07/think-global-act-local.jpg
  • 16. 16© Cloudera, Inc. All rights reserved. ExecutorExecutor Task Scheduling Driver Executor DAG Scheduler Task Scheduler Core TaskTask Shuffle Shuffle stagestageStage Spark Context JobJobJob
  • 17. 17© Cloudera, Inc. All rights reserved. Host-b.mydomain.com Locality host-a.mydomain.com Resource Manager Node Manager HDFS x:B1 x:B2 y:B1 y:B3 Host-c.mydomain.com Node Manager Node Manager HDFS x:B3 x:B2 y:B2 y:B3 HDFS x:B3 x:B1 y:B1 y:B2 hdfs://x hdfs://y Exec2 Exec1Driver
  • 18. 18© Cloudera, Inc. All rights reserved. Spark creates executors before executing code!
  • 19. 19© Cloudera, Inc. All rights reserved. Underutilized Clusters Courtesy of: http://media.nbclosangeles.com/images/1200*675/60-freeway-repair-dec16-2-empty.JPG
  • 20. 20© Cloudera, Inc. All rights reserved. Dynamic Allocation • Spark applications scale the number of executors based on load • Removes need for: --num-executors • Idle executors get killed • First supported in CDH 5.4 • Ideal for: • Long ETL jobs with large shuffles • shell applications: hive and spark shell
  • 21. 21© Cloudera, Inc. All rights reserved. Task Scheduling Driver DAG Scheduler Task Scheduler stagestageStage Spark Context JobJobJob host-a.mydomain.com Node Manager Exec1 host-b.mydomain.com Node Manager Exec2 host-c.mydomain.com Node Manager Task Task Exec3 Task Task RM
  • 22. 22© Cloudera, Inc. All rights reserved. Dynamic Allocation Configuration • Many Knobs • spark.dynamicAllocation.enabled • spark.dynamicAllocation.[min|max|initial]Executors • spark.dynamicAllocation.executorIdleTimeout • spark.dynamicAllocation.cachedExecutorIdleTimeout • ... • --num-executors will disable dynamic allocation
  • 23. 23© Cloudera, Inc. All rights reserved. Dynamic Allocation Limitations • Still required to specify cores • --num-cores • Memory • --executor-memory • Includes JVM overhead • Caching • spark.dynamicAllocation.cachedExecutorIdleTimeout
  • 24. 24© Cloudera, Inc. All rights reserved. The Future of Dynamic Allocation • Only “task size” needed: --task-size • Eliminates • --num-cores • --num-executors • --executor-memory • Leads to better cluster utilization
  • 25. 25© Cloudera, Inc. All rights reserved. Dynamic Allocation respects Locality!
  • 26. 26© Cloudera, Inc. All rights reserved. Security, oh no! Courtesy of: https://www.iti.illinois.edu/sites/default/files/Cybersecurity_image.jpg
  • 27. 27© Cloudera, Inc. All rights reserved. Security • Shared resources -> Shared data • Security has many facets • Encryption • Authentication • Authorization • Encryption is interesting for multi-tenant clusters
  • 28. 28© Cloudera, Inc. All rights reserved. Encryption Who’s looking at the data?
  • 29. 29© Cloudera, Inc. All rights reserved. Data Flow in Spark Driver Executor Executor Spark Submit Control Plane File Distribution Shuffle Blocks UI Disk Disk Spilled/Shuffle Blocks
  • 30. 30© Cloudera, Inc. All rights reserved. Prior to Spark 1.6 • Different channel, different method • Control plane • File distribution • Shuffle Blocks • User UI / REST API • Spilled/Shuffle Blocks SSL SSL SASL Encryption No Encryption Use encrypfs (or equivalent)
  • 31. 31© Cloudera, Inc. All rights reserved. What is wrong with SSL?
  • 32. 32© Cloudera, Inc. All rights reserved. Why not SSL? • SSL can be hard to set up • Need certificates readable on every node • Sharing certificates not as secure • Hard to have per-user certificate
  • 33. 33© Cloudera, Inc. All rights reserved. Spark 1.6 • Standardize around a common transport library • Replaces Akka RPC (SPARK-6028) • Replaces HTTP File service (SPARK-11140) • Uses Netty transport library with SASL Encryption • But.. • WebUI still has no encryption • Shuffle / Spilled blocks still require FS-level encryption • SASL in JVM restricted to 3DES – not very strong and slow
  • 34. 34© Cloudera, Inc. All rights reserved. Spark 2.0 • REPL class distribution using transport lib (SPARK-11563) • HTTPS Support for WebUI (SPARK-2750) • Encrypting spilled blocks is almost available (SPARK-5682) • Depends on third party Chimera library for encryption • Work is being done to add Chimera to Apache Commons • Future: • Use Chimera to encrypt over-the-wire data
  • 35. 35© Cloudera, Inc. All rights reserved. Gateways: launching Spark Application Courtesy of:
  • 36. 36© Cloudera, Inc. All rights reserved. Host-b.mydomain.com Spark Gateway Resource Manager Host-c.mydomain.com Node Manager Node Manager gateway-a.mydomain.com Bob Client Client Configs Spark Install Random Ports Driver Exec1 Exec2 Exec1 Driver SSH
  • 37. 37© Cloudera, Inc. All rights reserved. Gateway Considerations • Gateway hosts actively managed by administrators • Updates to client configurations and Spark installs • Users need to tunnel into network • Difficult to put users behind firewall • YARN allows different Spark versions • spark.yarn.jar or spark.yarn.archive • Shared Spark services makes this difficult
  • 38. 38© Cloudera, Inc. All rights reserved. Host-b.mydomain.com Shared Services Resource Manager Host-c.mydomain.com Node Manager Node Manager gateway-a.mydomain.com Bob Client Client Configs Spark Install Random Ports Driver Exec1 Exec2 Exec1 Driver SSH S S S S History Service
  • 39. 39© Cloudera, Inc. All rights reserved. Alternative An open source Apache licensed REST web service that manages long running Spark contexts in your cluster
  • 40. 40© Cloudera, Inc. All rights reserved. Livy Architecture Rest Server Cluster Manager Driver ExecutorExecutor Client Driver ExecutorExecutor The Managed ClusterHTTP Context 1 Context 2 Context 2 Context 1
  • 41. 41© Cloudera, Inc. All rights reserved. Case 1: Spark Application JAR Submission • Enables spark applications to be submitted without needing a Spark installation • Basically a wrapper around spark-submit % curl –XPOST localhost:8998/batches -d '{ "file": "<path_to_file>", “className”: “com.foo.bar..” ... }'
  • 42. 42© Cloudera, Inc. All rights reserved. How do you retrieve results?
  • 43. 43© Cloudera, Inc. All rights reserved. Case 2: Fine grained Job submission • Programmatic submission of Spark jobs to a long running application • A thin Java (and Scala) client available for easier integration • Provides automatic serialization/deserialization • Enables Web/Mobile applications to use Spark as a backend
  • 44. 44© Cloudera, Inc. All rights reserved. Case 2: Example // Create Livy Client LivyClient client = new LivyClientBuilder(false) .setURI(new URI(”<uri>")) .setAll(<config>) .build() // JobHandle allows monitoring of jobs JobHandle<Long> handle = client.submit(new YourJob()); // Block until results are returned handle.get(TIMEOUT, TimeUnit.SECONDS) // Close connections client.stop()
  • 45. 45© Cloudera, Inc. All rights reserved. Case 2: Example private static class YourJob implements Job<Long> { @Override public Long call(JobContext jc) { ArrayList<Long> list = Arrays.asList(1, 2, 3, 4, 5); JavaRDD<Integer> rdd = jc.sc().parallelize(list); return rdd.count(); } } // Job Interface to Implement public interface Job<T> extends Serializable { T call(JobContext jc) throws Exception; }
  • 46. 46© Cloudera, Inc. All rights reserved. Contributions Welcome! • http://livy.io/ • Code: https://github.com/cloudera/livy • JIRA: https://issues.cloudera.org/browse/LIVY • Users: http://groups.google.com/a/cloudera.org/group/livy-user • Dev: http://groups.google.com/a/cloudera.org/group/livy-dev
  • 47. 47© Cloudera, Inc. All rights reserved. Thank you

Hinweis der Redaktion

  1. This shows up in the YARN NodeManager logs
  2. Allow multiple groups to access shared resources while ensuring some dedicated share of the resource
  3. Allow multiple groups to access shared resources while ensuring some dedicated share of the resource
  4. Spark makes building a proof of concept with a subset of data relatively easy.
  5. Every connection in the previous slide can transmit sensitive data! Input data transmitted via broadcast variables Computed data during shuffles Data in serialized tasks, files uploaded with the job How to prevent other users from seeing this data?
  6. Spark makes building a proof of concept with a subset of data relatively easy.