Drizzle is a low latency execution engine for Apache Spark
that is targeted at stream processing and iterative workloads.
Currently, Spark uses a BSP computation model, and notifies the
scheduler at the end of each task. Invoking the scheduler at the end of each task adds overheads and results in decreased throughput and increased latency. In Drizzle, we introduce group scheduling, where multiple batches (or a group) of computation are scheduled at once.
This helps decouple the granularity of task execution from scheduling and amortize the costs of task serialization and launch. Our experiments on a 128 node EC2 cluster show that Drizzle can achieve end-to-end streaming latencies of less than 100ms and can get up to 3.5x lower latency than Spark Streaming. Compared to Apache Flink, a record-at-a-time streaming system, we show that Drizzle can recover around 4x faster from failures and that Drizzle has up to 13x lower latency during recovery.
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shivaram Venkataraman
1. DRIZZLE: Low latency execution for apache spark
Shivaram Venkataraman, Aurojit Panda, Kay Ousterhout
2. Who am I ?
PhD candidate, AMPLab UC Berkeley
Dissertation: System design for large scale machine learning
Apache Spark PMC Member. Contributions to Spark core, MLlib, SparkR
3. Low latency: SPARK STREAMING
“Delivering low latency, high throughput, and stability
simultaneously:* Right now, our own tests indicate you
can get at most two of these characteristics out of Spark
Streaming at the same time.”
From https://goo.gl/wGCrtE
“How to choose right DStream batch interval”
From https://goo.gl/6UX0FW
“Getting the best performance out of a Spark Streaming application on
a cluster requires a bit of tuning…Reducing the processing time of each
batch of data by efficiently using cluster resources. Setting the right batch
size such that the batches of data can be processed as fast as they are
received….” From spark.apache.org/docs/latest/streaming-programming-guide
15. DAG scheduling
Assign tasks to hosts using
(a) locality preferences
(b) straggler mitigation
(c) fair sharing etc.
Tasks
Host1
Host2
Driver
Host1
Host2
Serialize &
Launch
Host
Metadata
Scheduler
16. SCALING BATCH COMPUTATION
Cluster: 4 core, r3.xlarge machines Workload: Sum of 10k numbers per-core
Median-task time breakdown
0
50
100
150
200
250
4
8
16
32
64
128
Time(ms)
Machines
Compute + Data Transfer
Task Fetch
Scheduler Delay
17. DAG scheduling
Assign tasks to hosts using
(a) locality preferences
(b) straggler mitigation
(c) fair sharing etc.
Tasks
Host1
Host2
Driver
Host1
Host2
Serialize &
Launch
Host
Metadata
Scheduler
Same DAG structure
for many iterations
Can reuse scheduling decisions
18. GROUP scheduling
Schedule a group
of iterations at once
Fault tolerance, scheduling
at group boundaries
1 stage in each iteration
group = 2
19. How much does this help ?
1
10
100
1000
4
8
16
32
64
128
Time/Iter(ms)
Machines
Apache Spark
Drizzle-10
Drizzle-50
Drizzle-100
Workload: Sum of 10k numbers per-core
Single Stage Job, 100 iterations – Varying Drizzle group size
26. group=1 à Batch processing
GROUP scheduling trade-offs
Higher overhead
Smaller window for fault tolerance
group=N à Parallel operators
Lower overhead
Larger window for fault tolerance
27. GROUP scheduling – AUTO TUNING
Goal : Smallest group such that overhead is between fixed threshold
Tuning algorithm
- Measure scheduler delay, execution time per group
- If overhead > threshold, multiplicatively increase group size
- If overhead < threshold, additively decrease group size
Similar to AIMD schemes used in TCP congestion control
29. MLLIB ALGORITHMS
Iterative patterns à
Gradient Descent
PCA
…
Similar structure to streaming !
Model stored, updated as shared state
Parameter server integration
State
35. OPEN SOURCE UPDATE
Spark Scheduler Improvements
- SPARK-18890, SPARK-18836, SPARK-19485
- Addresses serialization, RPC bottlenecks etc.
Design discussion to integrate Drizzle: SPARK-19487
Open source code at: https://github.com/amplab/drizzle-spark
36. conclusion
Low latency during execution and while adapting
Drizzle: Decouple execution from centralized scheduling
Amortize overheads using group scheduling, pre-scheduling
Shivaram Venkataraman
shivaram@cs.berkeley.edu
Source Code: https://github.com/amplab/drizzle-spark