As the adoption of Spark Streaming increases rapidly, the community has been asking for greater robustness and scalability from Spark Streaming applications in a wider range of operating environments. To fulfill these demands, we have steadily added a number of features in Spark Streaming. We have added backpressure mechanisms which allows Spark Streaming to dynamically adapt to changes in incoming data rates, and maintain stability of the application. In addition, we are extending Spark’s Dynamic Allocation to Spark Streaming, so that streaming applications can elastically scale based on processing requirements. In my talk, I am going to explore these mechanisms and explain how developers can write robust, scalable and adaptive streaming applications using them. Presented by Tathagata "TD" Das from Databricks.
Building Robust Adaptive Streaming Apps with Spark
1. Building Robust, Adaptive Streaming
Apps with Spark Streaming
Tathagata “TD” Das
@tathadas
Spark Summit East 2016
2. Who am I?
Project ManagementCommittee (PMC) member of Spark
Started Spark Streaming in grad school - AMPLab, UC Berkeley
Currenttechnical lead of Spark Streaming
Software engineerat Databricks
2
3. Streaming Apps in the Real World
Building a high volume stream processingsystem in
production has many challenges
3
Fast and Scalable Spark Streaming is fast, distributed and
scalableby design!
Running in production at many places
with largeclusters and high data volumes
4. Streaming Apps in the Real World
Building a high volume stream processingsystem in
production has many challenges
4
Fast and Scalable
Easy to program
Spark Streaming makes it easy to express
complex streaming businesslogic
Interoperateswith Spark RDDs, Spark SQL
DataFrames/Datasets and MLlib
5. Streaming Apps in the Real World
Building a high volume stream processingsystem in
production has many challenges
5
Fast and Scalable
Easy to program
Fault-tolerant
Spark Streaming is fullyfault-tolerant
and can provide end-to-end semantic
guarantees
See my previous Spark Summit talk for moredetails
6. Streaming Apps in the Real World
Building a high volume stream processingsystem in
production has many challenges
6
Fast and Scalable
Easy to program
Fault-tolerant
Adaptive Focus of this talk
7. Adaptive Streaming Apps
Processing conditions can changedynamically
Sudden surges in data rates
Diurnal variationsin processing load
Unexpected slowdownsin downstreamdata stores
Streaming apps should be able to adapt accordingly
7
10. Motivation
Stability condition for any streaming app
Receive data only as fast as the system can processit
Stability condition for Spark Streaming’s “micro-batch” model
Finish processing previous batch before next one arrives
10
11. Stable micro-batch operation
11
batch interval
1s 1s 1s 1s
batch processing time <=
Previousbatch isprocessed before next one arrives => stable
Spark Streaming runs micro-batchesat fixed batchintervals
time
0s 2s 4s 6s 8s
12. Unstable micro-batch operation
12
batch processing time > batch interval
2.1s 2.1s 2.1s 2.1s
scheduling delay builds up
Batchescontinuously getsdelayed and backlogged => unstable
time
Spark Streaming runs micro-batchesat fixed batchintervals
0s 2s 4s 6s 8s
14. Backpressure: Dynamic rate limiting
14
Batch processingtimes
Scheduling delays
Batch processingtimes and schedulingdelays used to
continuouslyestimate currentprocessing rates
15. Backpressure: Dynamic rate limiting
15
∑
Batch processingtimes
Scheduling delays
Max stable processingrate estimated with PID Controllertheory
Well known feedbackmechanismused in industrial control systems
PID Controller
16. Backpressure: Dynamic rate limiting
16
∑
Batch processingtimes
Scheduling delays
Rate limits on
ingestion
Accordingly, the system dynamically adapts
the limits on the data ingestion rates
PID Controller
17. Data buffered in Kafka, ensures
Spark Streaming stays stable
Backpressure: Dynamic rate limiting
17
If HDFS ingestion slows down,
processingtimes increase
SS lowers rate limits to slow
down receiving from Kafka
20. Elastic Scaling (aka Dynamic Allocation)
Scaling the number of Spark executorsaccording to the load
Spark alreadysupports Dynamic Allocation for batch jobs
Scale down if executorsare idle
Scale up if tasks are queueing up
Streaming “micro-batch” jobs needdifferent scaling policy
No executoris idle for a long time!
20
21. Scaling policy with Streaming
21
Load is very low
Lots idle time between batches
Cluster resources wasted
stable but
inefficient
22. Scaling policy with Streaming
22
scale down
the cluster
scaling down increases batch processing times
stable operation with less resources wasted
stable but
inefficient
stable and
efficient
23. Scaling policy with Streaming
23
scaling up reduces batch processing times
unstable system becomes stable again
scale up
the cluster
unstable
stable
24. Elastic Scaling
24
Data buffered in Kafka starts draining,
allowing app adapt to any data rate
If Kafka gets data faster than
what backpressureallows
SS scalesup clusterto
increaseprocessing rate
25. Elastic Scaling: Configuration
Will be available in Spark 2.0
Enabled through SparkConf,set
spark.streaming.dynamicAllocation.enabled = true
More parameters will be in the online programming guide
25
26. Elastic Scaling: Configuration
Make surethere is enoughparallelismto take advantage of
max cluster size
# of partitions in reduce,join, etc.
# of Kafka partitions
# of receivers
Gives usualfault-tolerance guaranteeswith files, Kafka Direct,
Kinesis, and receiver-basedsourceswith WAL enabled
26
30. Demo
30
Processing time increases
with data rate, until equal to
batch interval
Backpressurelimits the
ingestion rate lower than 20k
recs/secto keep app stable