SlideShare ist ein Scribd-Unternehmen logo
1 von 56
1© Cloudera, Inc. All rights reserved.
Why your Spark Job is Failing
Kostas Sakellis
2© Cloudera, Inc. All rights reserved.
Me
• Software Engineering at Cloudera
• Contributor to Apache Spark
• Before that, worked on Cloudera Manager
3© Cloudera, Inc. All rights reserved.
com.esotericsoftware.kryo.KryoException:
Unable to find class:
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC
$$iwC$$anonfun$4$$anonfun$apply$3
4© Cloudera, Inc. All rights reserved.
We go about our
day ignoring
manholes until…
Courtesy of: http://www.independent.co.uk/incoming/article9127706.ece/binary/original/maholev23.jpg
5© Cloudera, Inc. All rights reserved.
… something goes
wrong.
Courtesy of: http://greenpointers.com/wp-content/uploads/2015/03/Manhole-Explosion1.jpg
6© Cloudera, Inc. All rights reserved.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1
in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage
0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException:
For input string: "3.9166,10.2491,-4.0926,-4.4659,0"
at
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
at java.lang.Double.parseDouble(Double.java:540)
at scala.collection.immutable.StringLike
[...]
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSched
uler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSche
duler.scala:1192)
[...]
7© Cloudera, Inc. All rights reserved.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1
in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage
0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException:
For input string: "3.9166,10.2491,-4.0926,-4.4659,0"
at
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
at java.lang.Double.parseDouble(Double.java:540)
at scala.collection.immutable.StringLike
[...]
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSched
uler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSche
duler.scala:1192)
[...]
8© Cloudera, Inc. All rights reserved.
Job? What now?
Courtesy of:http://calvert.lib.md.us/jobs_pic.jpg
9© Cloudera, Inc. All rights reserved.
Example
sc.textFile(“hdfs://…”, 4)
.map((x) => x.toInt)
.filter(_ > 10)
.sum()
10© Cloudera, Inc. All rights reserved.
Example
sc.textFile(“hdfs://…”, 4)
.map((x) => x.toInt)
.filter(_ > 10)
.sum()
11© Cloudera, Inc. All rights reserved.
Example
sc.textFile(“hdfs://…”, 4)
.map((x) => x.toInt)
.filter(_ > 10)
.sum()
12© Cloudera, Inc. All rights reserved.
Then what the
heck is a stage?
Courtesy of: https://writinginadeadworld.files.wordpress.com/2014/03/rock1.jpeg
13© Cloudera, Inc. All rights reserved.
Partitions
sc.textFile(“hdfs://…”, 4)
.map((x) => x.toInt)
.filter(_ > 10)
.sum()
HDFS
Partition 1
Partition 2
Partition 3
Partition 4
14© Cloudera, Inc. All rights reserved.
RDDs
sc.textFile(“hdfs://…”, 4)
.map((x) => x.toInt)
.filter(_ > 10)
.sum()
…RDD1
HDFS
Partition 1
Partition 2
Partition 3
Partition 4
15© Cloudera, Inc. All rights reserved.
RDDs
sc.textFile(“hdfs://…”, 4)
.map((x) => x.toInt)
.filter(_ > 10)
.sum()
…RDD1 …RDD2
HDFS
Partition 1
Partition 2
Partition 3
Partition 4
Partition 1
Partition 2
Partition 3
Partition 4
16© Cloudera, Inc. All rights reserved.
RDDs
sc.textFile(“hdfs://…”, 4)
.map((x) => x.toInt)
.filter(_ > 10)
.sum()
…RDD1 …RDD2
HDFS
Partition 1
Partition 2
Partition 3
Partition 4
Partition 1
Partition 2
Partition 3
Partition 4
…RDD3
Partition 1
Partition 2
Partition 3
Partition 4
17© Cloudera, Inc. All rights reserved.
…RDD1 …RDD2
RDDs
HDFS
Partition 1
Partition 2
Partition 3
Partition 4
sc.textFile(“hdfs://…”, 4)
.map((x) => x.toInt)
.filter(_ > 10)
.sum()
Partition 1
Partition 2
Partition 3
Partition 4
…RDD3
Partition 1
Partition 2
Partition 3
Partition 4
Sum
18© Cloudera, Inc. All rights reserved.
…RDD1 …RDD2
RDD Lineage
HDFS
Partition 1
Partition 2
Partition 3
Partition 4
sc.textFile(“hdfs://…”, 4)
.map((x) => x.toInt)
.filter(_ > 10)
.sum()
Partition 1
Partition 2
Partition 3
Partition 4
…RDD3
Partition 1
Partition 2
Partition 3
Partition 4
Sum
Lineage
19© Cloudera, Inc. All rights reserved.
RDD Dependencies
…RDD1 …RDD2
HDFS
Partition 1
Partition 2
Partition 3
Partition 4
Partition 1
Partition 2
Partition 3
Partition 4
…RDD3
Partition 1
Partition 2
Partition 3
Partition 4
Sum
Narrow Dependencies
• Narrow and Wide Dependencies
20© Cloudera, Inc. All rights reserved.
Wide Dependencies
• Sometimes records need to be grouped together
• Examples
• join
• groupByKey
• Stages created at wide dependency boundaries
21© Cloudera, Inc. All rights reserved.
A more Interesting Spark Job
val rdd1 = sc.textFile(“hdfs://...”)
.map(someFunc)
.filter(filterFunc)
val rdd2 = sc.hadoopFile(“hdfs://...”)
.groupByKey()
.map(someOtherFunc)
val rdd3 = rdd1.join(rdd2)
.map(someFunc)
rdd3.collect()
22© Cloudera, Inc. All rights reserved.
A more Interesting Spark Job
val rdd1 = sc.textFile(“hdfs://...”)
.map(someFunc)
.filter(filterFunc)
maptextFile filter
23© Cloudera, Inc. All rights reserved.
A more Interesting Spark Job
val rdd2 = sc.hadoopFile(“hdfs://...”)
.groupByKey()
.map(someOtherFunc)
groupByKeyhadoopFile map
24© Cloudera, Inc. All rights reserved.
A more Interesting Spark Job
val rdd3 = rdd1.join(rdd2)
.map(someFunc)
join map
25© Cloudera, Inc. All rights reserved.
A more Interesting Spark Job
rdd3.collect()
maptextFile filter
group
ByKey
hadoop
File
map
join map
1
Wide Dependencies
1
2 3
4
26© Cloudera, Inc. All rights reserved.
Get to the point before I stop
caring!
27© Cloudera, Inc. All rights reserved.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1
in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage
0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException:
For input string: "3.9166,10.2491,-4.0926,-4.4659,0"
at
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
at java.lang.Double.parseDouble(Double.java:540)
at scala.collection.immutable.StringLike
[...]
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSched
uler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSche
duler.scala:1192)
[...]
28© Cloudera, Inc. All rights reserved.
What was the failure?
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1
in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage
0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException:
For input string: "3.9166,10.2491,-4.0926,-4.4659,0”
[...]
29© Cloudera, Inc. All rights reserved.
What was the failure?
Stage
Task Task
Task Task
30© Cloudera, Inc. All rights reserved.
What was the failure?
Stage
Task Task
Task Task
31© Cloudera, Inc. All rights reserved.
What was the failure?
Stage
Task Task
Task Task
spark.task.maxFailures=4
32© Cloudera, Inc. All rights reserved.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1
in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage
0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException:
For input string: "3.9166,10.2491,-4.0926,-4.4659,0"
at
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
at java.lang.Double.parseDouble(Double.java:540)
at scala.collection.immutable.StringLike
[...]
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSched
uler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSche
duler.scala:1192)
[...]
33© Cloudera, Inc. All rights reserved.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1
in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage
0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException:
For input string: "3.9166,10.2491,-4.0926,-4.4659,0"
at
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
at java.lang.Double.parseDouble(Double.java:540)
at scala.collection.immutable.StringLike
[...]
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSched
uler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSche
duler.scala:1192)
[...]
34© Cloudera, Inc. All rights reserved.
ERROR executor.Executor: Exception in task ID 2866
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:565)
at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:648)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:706)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:209)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:173)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:206)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:45)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:164)
[...]
35© Cloudera, Inc. All rights reserved.
Spark Architecture
36© Cloudera, Inc. All rights reserved.
YARN Architecture
Resource Manager
Node Manager
Container Container
Node Manager
Container Container
Application
Master
Client
Process Process
37© Cloudera, Inc. All rights reserved.
Spark on YARN Architecture
Resource Manager
Node Manager
Container Container
Node Manager
Container ContainerClient
Process Process
38© Cloudera, Inc. All rights reserved.
Spark on YARN Architecture
Resource Manager
Node Manager
Container Container
Node Manager
Container Container
Application
Master
Client
Process Process
39© Cloudera, Inc. All rights reserved.
spark-submit --executor-memory 2g
--master yarn-client
--num-executors 2
--num-cores 2
40© Cloudera, Inc. All rights reserved.
Container
[pid=63375,containerID=container_1388158490598_0001_01_00
0003] is running beyond physical memory limits. Current
usage: 2.2 GB of 2.1 GB physical memory used; 2.8 GB of
4.2 GB virtual memory used. Killing container.
[...]
41© Cloudera, Inc. All rights reserved.
Container
[pid=63375,containerID=container_1388158490598_0001_01_00
0003] is running beyond physical memory limits. Current
usage: 2.2 GB of 2.1 GB physical memory used; 2.8 GB of
4.2 GB virtual memory used. Killing container.
[...]
42© Cloudera, Inc. All rights reserved.
spark-submit --executor-memory 2g
--master yarn-client
--num-executors 2
--num-cores 2
43© Cloudera, Inc. All rights reserved.
yarn.nodemanager.resource.memory-mb
Executor Container
spark.yarn.executor.memoryOverhead (7%) (10% in 1.4)
spark.executor.memory
spark.shuffle.memoryFraction (0.4) spark.storage.memoryFraction (0.6)
Memory allocation
44© Cloudera, Inc. All rights reserved.
Sometimes jobs run
slow or even…
Courtesy of: http://blog.sdrock.com/pastors/files/2013/06/time-clock.jpg
45© Cloudera, Inc. All rights reserved.
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1986)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
[...]
46© Cloudera, Inc. All rights reserved.
GC Stalls
47© Cloudera, Inc. All rights reserved.
Too much spilling!
Courtesy of: http://tgnp.me/wp-content/uploads/2014/05/spilled-starbucks.jpg
48© Cloudera, Inc. All rights reserved.
Shuffle Boundaries
maptextFile filter
group
ByKey
hadoop
File
map
join map
Shuffle
49© Cloudera, Inc. All rights reserved.
Most performance issues are in
shuffles!
50© Cloudera, Inc. All rights reserved.
Inside a Task: Fetch & Aggregate
ExternalAppendOnlyMapBlock
Block
deserialize
deserialize
key1 -> values
key2 -> values
key3 -> values
key4 -> values
Sort & Spill
key1 -> values
key2 -> values
key3 -> values
51© Cloudera, Inc. All rights reserved.
rdd.reduceByKey(reduceFunc,
numPartitions=1000)
Inside a Task: Specify partitions
52© Cloudera, Inc. All rights reserved.
Why not set partitions to ∞ ?
53© Cloudera, Inc. All rights reserved.
Excessive parallelism
• Overwhelming scheduler overhead
• More fetches -> more disk seeks
• Driver needs to track state per-task
54© Cloudera, Inc. All rights reserved.
So how to choose?
• Easy answer:
• Keep multiplying by 1.5 and see what works
55© Cloudera, Inc. All rights reserved.
Is Spark bad?
Courtesy of: https://theferkel.files.wordpress.com/2015/04/250474-breaking-bad-quotes.jpg
56© Cloudera, Inc. All rights reserved.
Thank you

Weitere ähnliche Inhalte

Was ist angesagt?

Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used forAljoscha Krettek
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsSpark Summit
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1ScyllaDB
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache SparkDatabricks
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationDatabricks
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka StreamsGuozhang Wang
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxData
 
Data Security at Scale through Spark and Parquet Encryption
Data Security at Scale through Spark and Parquet EncryptionData Security at Scale through Spark and Parquet Encryption
Data Security at Scale through Spark and Parquet EncryptionDatabricks
 
Spark Performance Tuning .pdf
Spark Performance Tuning .pdfSpark Performance Tuning .pdf
Spark Performance Tuning .pdfAmit Raj
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQLDatabricks
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsAnton Kirillov
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 

Was ist angesagt? (20)

Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
Data Security at Scale through Spark and Parquet Encryption
Data Security at Scale through Spark and Parquet EncryptionData Security at Scale through Spark and Parquet Encryption
Data Security at Scale through Spark and Parquet Encryption
 
Spark Performance Tuning .pdf
Spark Performance Tuning .pdfSpark Performance Tuning .pdf
Spark Performance Tuning .pdf
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 

Ähnlich wie Why your Spark Job is Failing

Harmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and PuppetHarmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and PuppetAchieve Internet
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowAdrian Cockcroft
 
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...Cloudera, Inc.
 
Docker Security workshop slides
Docker Security workshop slidesDocker Security workshop slides
Docker Security workshop slidesDocker, Inc.
 
New Docker Features for Orchestration and Containers
New Docker Features for Orchestration and ContainersNew Docker Features for Orchestration and Containers
New Docker Features for Orchestration and ContainersJeff Anderson
 
What's New in Docker 1.12 by Mike Goelzer and Andrea Luzzardi
What's New in Docker 1.12 by Mike Goelzer and Andrea LuzzardiWhat's New in Docker 1.12 by Mike Goelzer and Andrea Luzzardi
What's New in Docker 1.12 by Mike Goelzer and Andrea LuzzardiDocker, Inc.
 
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea LuzzardiWhat's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea LuzzardiMike Goelzer
 
Things I've learned working with Docker Support
Things I've learned working with Docker SupportThings I've learned working with Docker Support
Things I've learned working with Docker SupportSujay Pillai
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark Hubert Fan Chiang
 
Hardening cassandra q2_2016
Hardening cassandra q2_2016Hardening cassandra q2_2016
Hardening cassandra q2_2016zznate
 
Securing Cassandra for Compliance
Securing Cassandra for ComplianceSecuring Cassandra for Compliance
Securing Cassandra for ComplianceDataStax
 
Drupalcon2007 Sun
Drupalcon2007 SunDrupalcon2007 Sun
Drupalcon2007 Sunsmattoon
 
Cloud-native applications with Java and Kubernetes - Yehor Volkov
 Cloud-native applications with Java and Kubernetes - Yehor Volkov Cloud-native applications with Java and Kubernetes - Yehor Volkov
Cloud-native applications with Java and Kubernetes - Yehor VolkovKuberton
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
 
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at ClouderaDataconomy Media
 
Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark OperationsCloudera, Inc.
 
Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...
Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...
Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...Patrick Chanezon
 

Ähnlich wie Why your Spark Job is Failing (20)

Spark etl
Spark etlSpark etl
Spark etl
 
Harmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and PuppetHarmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and Puppet
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
 
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
 
Docker Security workshop slides
Docker Security workshop slidesDocker Security workshop slides
Docker Security workshop slides
 
New Docker Features for Orchestration and Containers
New Docker Features for Orchestration and ContainersNew Docker Features for Orchestration and Containers
New Docker Features for Orchestration and Containers
 
What's New in Docker 1.12 by Mike Goelzer and Andrea Luzzardi
What's New in Docker 1.12 by Mike Goelzer and Andrea LuzzardiWhat's New in Docker 1.12 by Mike Goelzer and Andrea Luzzardi
What's New in Docker 1.12 by Mike Goelzer and Andrea Luzzardi
 
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea LuzzardiWhat's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
 
Things I've learned working with Docker Support
Things I've learned working with Docker SupportThings I've learned working with Docker Support
Things I've learned working with Docker Support
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 
Sparkstreaming
SparkstreamingSparkstreaming
Sparkstreaming
 
Hardening cassandra q2_2016
Hardening cassandra q2_2016Hardening cassandra q2_2016
Hardening cassandra q2_2016
 
Securing Cassandra for Compliance
Securing Cassandra for ComplianceSecuring Cassandra for Compliance
Securing Cassandra for Compliance
 
Drupalcon2007 Sun
Drupalcon2007 SunDrupalcon2007 Sun
Drupalcon2007 Sun
 
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant ClustersEffective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
 
Cloud-native applications with Java and Kubernetes - Yehor Volkov
 Cloud-native applications with Java and Kubernetes - Yehor Volkov Cloud-native applications with Java and Kubernetes - Yehor Volkov
Cloud-native applications with Java and Kubernetes - Yehor Volkov
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
 
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
 
Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark Operations
 
Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...
Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...
Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Kürzlich hochgeladen (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Why your Spark Job is Failing

  • 1. 1© Cloudera, Inc. All rights reserved. Why your Spark Job is Failing Kostas Sakellis
  • 2. 2© Cloudera, Inc. All rights reserved. Me • Software Engineering at Cloudera • Contributor to Apache Spark • Before that, worked on Cloudera Manager
  • 3. 3© Cloudera, Inc. All rights reserved. com.esotericsoftware.kryo.KryoException: Unable to find class: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC $$iwC$$anonfun$4$$anonfun$apply$3
  • 4. 4© Cloudera, Inc. All rights reserved. We go about our day ignoring manholes until… Courtesy of: http://www.independent.co.uk/incoming/article9127706.ece/binary/original/maholev23.jpg
  • 5. 5© Cloudera, Inc. All rights reserved. … something goes wrong. Courtesy of: http://greenpointers.com/wp-content/uploads/2015/03/Manhole-Explosion1.jpg
  • 6. 6© Cloudera, Inc. All rights reserved. org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException: For input string: "3.9166,10.2491,-4.0926,-4.4659,0" at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250) at java.lang.Double.parseDouble(Double.java:540) at scala.collection.immutable.StringLike [...] Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSched uler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSche duler.scala:1192) [...]
  • 7. 7© Cloudera, Inc. All rights reserved. org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException: For input string: "3.9166,10.2491,-4.0926,-4.4659,0" at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250) at java.lang.Double.parseDouble(Double.java:540) at scala.collection.immutable.StringLike [...] Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSched uler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSche duler.scala:1192) [...]
  • 8. 8© Cloudera, Inc. All rights reserved. Job? What now? Courtesy of:http://calvert.lib.md.us/jobs_pic.jpg
  • 9. 9© Cloudera, Inc. All rights reserved. Example sc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum()
  • 10. 10© Cloudera, Inc. All rights reserved. Example sc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum()
  • 11. 11© Cloudera, Inc. All rights reserved. Example sc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum()
  • 12. 12© Cloudera, Inc. All rights reserved. Then what the heck is a stage? Courtesy of: https://writinginadeadworld.files.wordpress.com/2014/03/rock1.jpeg
  • 13. 13© Cloudera, Inc. All rights reserved. Partitions sc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum() HDFS Partition 1 Partition 2 Partition 3 Partition 4
  • 14. 14© Cloudera, Inc. All rights reserved. RDDs sc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum() …RDD1 HDFS Partition 1 Partition 2 Partition 3 Partition 4
  • 15. 15© Cloudera, Inc. All rights reserved. RDDs sc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum() …RDD1 …RDD2 HDFS Partition 1 Partition 2 Partition 3 Partition 4 Partition 1 Partition 2 Partition 3 Partition 4
  • 16. 16© Cloudera, Inc. All rights reserved. RDDs sc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum() …RDD1 …RDD2 HDFS Partition 1 Partition 2 Partition 3 Partition 4 Partition 1 Partition 2 Partition 3 Partition 4 …RDD3 Partition 1 Partition 2 Partition 3 Partition 4
  • 17. 17© Cloudera, Inc. All rights reserved. …RDD1 …RDD2 RDDs HDFS Partition 1 Partition 2 Partition 3 Partition 4 sc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum() Partition 1 Partition 2 Partition 3 Partition 4 …RDD3 Partition 1 Partition 2 Partition 3 Partition 4 Sum
  • 18. 18© Cloudera, Inc. All rights reserved. …RDD1 …RDD2 RDD Lineage HDFS Partition 1 Partition 2 Partition 3 Partition 4 sc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum() Partition 1 Partition 2 Partition 3 Partition 4 …RDD3 Partition 1 Partition 2 Partition 3 Partition 4 Sum Lineage
  • 19. 19© Cloudera, Inc. All rights reserved. RDD Dependencies …RDD1 …RDD2 HDFS Partition 1 Partition 2 Partition 3 Partition 4 Partition 1 Partition 2 Partition 3 Partition 4 …RDD3 Partition 1 Partition 2 Partition 3 Partition 4 Sum Narrow Dependencies • Narrow and Wide Dependencies
  • 20. 20© Cloudera, Inc. All rights reserved. Wide Dependencies • Sometimes records need to be grouped together • Examples • join • groupByKey • Stages created at wide dependency boundaries
  • 21. 21© Cloudera, Inc. All rights reserved. A more Interesting Spark Job val rdd1 = sc.textFile(“hdfs://...”) .map(someFunc) .filter(filterFunc) val rdd2 = sc.hadoopFile(“hdfs://...”) .groupByKey() .map(someOtherFunc) val rdd3 = rdd1.join(rdd2) .map(someFunc) rdd3.collect()
  • 22. 22© Cloudera, Inc. All rights reserved. A more Interesting Spark Job val rdd1 = sc.textFile(“hdfs://...”) .map(someFunc) .filter(filterFunc) maptextFile filter
  • 23. 23© Cloudera, Inc. All rights reserved. A more Interesting Spark Job val rdd2 = sc.hadoopFile(“hdfs://...”) .groupByKey() .map(someOtherFunc) groupByKeyhadoopFile map
  • 24. 24© Cloudera, Inc. All rights reserved. A more Interesting Spark Job val rdd3 = rdd1.join(rdd2) .map(someFunc) join map
  • 25. 25© Cloudera, Inc. All rights reserved. A more Interesting Spark Job rdd3.collect() maptextFile filter group ByKey hadoop File map join map 1 Wide Dependencies 1 2 3 4
  • 26. 26© Cloudera, Inc. All rights reserved. Get to the point before I stop caring!
  • 27. 27© Cloudera, Inc. All rights reserved. org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException: For input string: "3.9166,10.2491,-4.0926,-4.4659,0" at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250) at java.lang.Double.parseDouble(Double.java:540) at scala.collection.immutable.StringLike [...] Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSched uler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSche duler.scala:1192) [...]
  • 28. 28© Cloudera, Inc. All rights reserved. What was the failure? org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException: For input string: "3.9166,10.2491,-4.0926,-4.4659,0” [...]
  • 29. 29© Cloudera, Inc. All rights reserved. What was the failure? Stage Task Task Task Task
  • 30. 30© Cloudera, Inc. All rights reserved. What was the failure? Stage Task Task Task Task
  • 31. 31© Cloudera, Inc. All rights reserved. What was the failure? Stage Task Task Task Task spark.task.maxFailures=4
  • 32. 32© Cloudera, Inc. All rights reserved. org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException: For input string: "3.9166,10.2491,-4.0926,-4.4659,0" at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250) at java.lang.Double.parseDouble(Double.java:540) at scala.collection.immutable.StringLike [...] Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSched uler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSche duler.scala:1192) [...]
  • 33. 33© Cloudera, Inc. All rights reserved. org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException: For input string: "3.9166,10.2491,-4.0926,-4.4659,0" at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250) at java.lang.Double.parseDouble(Double.java:540) at scala.collection.immutable.StringLike [...] Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSched uler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSche duler.scala:1192) [...]
  • 34. 34© Cloudera, Inc. All rights reserved. ERROR executor.Executor: Exception in task ID 2866 java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:565) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:648) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:706) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:209) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:173) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:206) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:45) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:164) [...]
  • 35. 35© Cloudera, Inc. All rights reserved. Spark Architecture
  • 36. 36© Cloudera, Inc. All rights reserved. YARN Architecture Resource Manager Node Manager Container Container Node Manager Container Container Application Master Client Process Process
  • 37. 37© Cloudera, Inc. All rights reserved. Spark on YARN Architecture Resource Manager Node Manager Container Container Node Manager Container ContainerClient Process Process
  • 38. 38© Cloudera, Inc. All rights reserved. Spark on YARN Architecture Resource Manager Node Manager Container Container Node Manager Container Container Application Master Client Process Process
  • 39. 39© Cloudera, Inc. All rights reserved. spark-submit --executor-memory 2g --master yarn-client --num-executors 2 --num-cores 2
  • 40. 40© Cloudera, Inc. All rights reserved. Container [pid=63375,containerID=container_1388158490598_0001_01_00 0003] is running beyond physical memory limits. Current usage: 2.2 GB of 2.1 GB physical memory used; 2.8 GB of 4.2 GB virtual memory used. Killing container. [...]
  • 41. 41© Cloudera, Inc. All rights reserved. Container [pid=63375,containerID=container_1388158490598_0001_01_00 0003] is running beyond physical memory limits. Current usage: 2.2 GB of 2.1 GB physical memory used; 2.8 GB of 4.2 GB virtual memory used. Killing container. [...]
  • 42. 42© Cloudera, Inc. All rights reserved. spark-submit --executor-memory 2g --master yarn-client --num-executors 2 --num-cores 2
  • 43. 43© Cloudera, Inc. All rights reserved. yarn.nodemanager.resource.memory-mb Executor Container spark.yarn.executor.memoryOverhead (7%) (10% in 1.4) spark.executor.memory spark.shuffle.memoryFraction (0.4) spark.storage.memoryFraction (0.6) Memory allocation
  • 44. 44© Cloudera, Inc. All rights reserved. Sometimes jobs run slow or even… Courtesy of: http://blog.sdrock.com/pastors/files/2013/06/time-clock.jpg
  • 45. 45© Cloudera, Inc. All rights reserved. java.lang.OutOfMemoryError: GC overhead limit exceeded at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1986) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) [...]
  • 46. 46© Cloudera, Inc. All rights reserved. GC Stalls
  • 47. 47© Cloudera, Inc. All rights reserved. Too much spilling! Courtesy of: http://tgnp.me/wp-content/uploads/2014/05/spilled-starbucks.jpg
  • 48. 48© Cloudera, Inc. All rights reserved. Shuffle Boundaries maptextFile filter group ByKey hadoop File map join map Shuffle
  • 49. 49© Cloudera, Inc. All rights reserved. Most performance issues are in shuffles!
  • 50. 50© Cloudera, Inc. All rights reserved. Inside a Task: Fetch & Aggregate ExternalAppendOnlyMapBlock Block deserialize deserialize key1 -> values key2 -> values key3 -> values key4 -> values Sort & Spill key1 -> values key2 -> values key3 -> values
  • 51. 51© Cloudera, Inc. All rights reserved. rdd.reduceByKey(reduceFunc, numPartitions=1000) Inside a Task: Specify partitions
  • 52. 52© Cloudera, Inc. All rights reserved. Why not set partitions to ∞ ?
  • 53. 53© Cloudera, Inc. All rights reserved. Excessive parallelism • Overwhelming scheduler overhead • More fetches -> more disk seeks • Driver needs to track state per-task
  • 54. 54© Cloudera, Inc. All rights reserved. So how to choose? • Easy answer: • Keep multiplying by 1.5 and see what works
  • 55. 55© Cloudera, Inc. All rights reserved. Is Spark bad? Courtesy of: https://theferkel.files.wordpress.com/2015/04/250474-breaking-bad-quotes.jpg
  • 56. 56© Cloudera, Inc. All rights reserved. Thank you

Hinweis der Redaktion

  1. Early on a colleague of ours sent us this exception… this is truncated This talk is going to be about these kinds of errors you sometimes get when running…
  2. This is probably the most common failure you’re going to see. First of all, in this case, the punchline here is going to be that the problem is your fault. But second of all, what does all this other stuff mean and why is Spark telling me this in this way.
  3. This is probably the most common failure you’re going to see. First of all, in this case, the punchline here is going to be that the problem is your fault. But second of all, what does all this other stuff mean and why is Spark telling me this in this way.
  4. Lets start with an example program in Spark.
  5. The sum() call launches a job
  6. Lets start with an example program in Spark.
  7. A chunk of data somewhere Could be on Hadoop File System (HDFS) Could be cached in Spark Defines the degree of parallelism
  8. Describes a way of generating input and output partitions Immutable – very important! RDDs can depend on other RDDs Most have single parent Joins have multiple parents Lineage over replication for fault tolerance https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
  9. Describes a way of generating input and output partitions Immutable – very important! RDDs can depend on other RDDs Most have single parent Joins have multiple parents Lineage over replication for fault tolerance https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
  10. Describes a way of generating input and output partitions Immutable – very important! RDDs can depend on other RDDs Most have single parent Joins have multiple parents Lineage over replication for fault tolerance https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
  11. Describes a way of generating input and output partitions Immutable – very important! RDDs can depend on other RDDs Most have single parent Joins have multiple parents Lineage over replication for fault tolerance https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
  12. Describes a way of generating input and output partitions Immutable – very important! RDDs can depend on other RDDs Most have single parent Joins have multiple parents Lineage over replication for fault tolerance https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
  13. Narrow map filter Wide join groupByKey
  14. More details: A job is a DAG of stages The scheduler creates a set of tasks per stage Partitions assigned to a task
  15. This is probably the most common failure you’re going to see. First of all, in this case, the punchline here is going to be that the problem is your fault. But second of all, what does all this other stuff mean and why is Spark telling me this in this way.
  16. Talk about the relationship between stages and tasks.
  17. So with all this information in hand, we can come back and interpret this error. We tried to run a job, but it failed because one of its stages failed. Why did that stage fail? Because one of its tasks failed. Tasks will be retried. Mercifully, Spark gives us the exception that caused the most recent failure.
  18. Mercifully, Spark gives us the exception that caused the most recent failure.
  19. Mercifully, Spark gives us the exception that caused the most recent failure.
  20. Lets review the general Spark architecture A driver Where the DAG scheduler lives Drives the sho Single point of failure Executors Communicates with driver Runs the tasks created by the driver Think of this as a ThreadPoolExecutor in java Pluggable cluster managers YARN, Mesos, standalone ----- Meeting Notes (6/10/15 14:57) -----
  21. Lets review the general Spark architecture
  22. Lets review the general Spark architecture
  23. Lets review the general Spark architecture
  24. This shows up in the YARN NodeManager logs
  25. mention that this is what happens with a groupByKey or reduceByKey show blocks being deserialized into Java objects and placed into map show spill with fewer tasks, more of these blocks have to go to the same reducer, and more stuff needs to be held in this map
  26. No Distributed systems are complicated