Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Grizzly: Efficient Stream Processing Through Adaptive Query Compilation

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 23 Anzeige

Grizzly: Efficient Stream Processing Through Adaptive Query Compilation

Herunterladen, um offline zu lesen

Stream Processing Engines (SPEs) execute long-running queries on unbounded data streams. They follow an interpretation-based processing model and do not perform runtime
optimizations. This limits the utilization of modern hardware
and neglects changing data characteristics at runtime.
In this paper, we present Grizzly, a novel adaptive query
compilation-based SPE, to enable highly efficient query execution. We extend query compilation and task-based parallelization for the unique requirements of stream processing and apply adaptive compilation to enable runtime reoptimizations. The combination of light-weight statistic gathering with just-in-time compilation enables Grizzly to adjust
to changing data-characteristics dynamically at runtime. Our
experiments show that Grizzly outperforms state-of-the-art
SPEs by up to an order of magnitude in throughput.

Stream Processing Engines (SPEs) execute long-running queries on unbounded data streams. They follow an interpretation-based processing model and do not perform runtime
optimizations. This limits the utilization of modern hardware
and neglects changing data characteristics at runtime.
In this paper, we present Grizzly, a novel adaptive query
compilation-based SPE, to enable highly efficient query execution. We extend query compilation and task-based parallelization for the unique requirements of stream processing and apply adaptive compilation to enable runtime reoptimizations. The combination of light-weight statistic gathering with just-in-time compilation enables Grizzly to adjust
to changing data-characteristics dynamically at runtime. Our
experiments show that Grizzly outperforms state-of-the-art
SPEs by up to an order of magnitude in throughput.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Ähnlich wie Grizzly: Efficient Stream Processing Through Adaptive Query Compilation (20)

Aktuellste (20)

Anzeige

Grizzly: Efficient Stream Processing Through Adaptive Query Compilation

  1. 1. Grizzly: Efficient Stream Processing Through Adaptive Query Compilation Philipp M. Grulich¹, Sebastian Breß², Steffen Zeuch¹², Jonas Traub¹, Janis von Bleichert¹, Zongxiong Chen², Tilmann Rabl³, Volker Markl¹² Technische Universität Berlin¹, DFKI GmbH², HPI & Universität Potsdam³ 1
  2. 2. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Limitations of state-of-the-art SPEs Current SPEs use hardware resources inefficiently [Zeuch et al., Zhang et al.] 2
  3. 3. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Limitations of state-of-the-art SPEs 3 1. Interpretation-based processing model causes poor cache utilization. Current SPEs use hardware resources inefficiently [Zeuch et al., Zhang et al.]
  4. 4. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Limitations of state-of-the-art SPEs 4 1. Interpretation-based processing model causes poor cache utilization. 2. Upfront-Partitioning causes high overhead on single nodes. Current SPEs use hardware resources inefficiently [Zeuch et al., Zhang et al.]
  5. 5. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Limitations of state-of-the-art SPEs 5 1. Interpretation-based processing model causes poor cache utilization. 2. Upfront-Partitioning causes high overhead on single nodes. 3. SPEs do not react to changing data-characteristics at runtime. Current SPEs use hardware resources inefficiently [Zeuch et al., Zhang et al.] Data Stream
  6. 6. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Limitations of state-of-the-art SPEs 6 1. Interpretation-based processing model causes poor cache utilization. 2. Upfront-Partitioning causes high overhead on single nodes. 3. SPEs do not react to changing data-characteristics. An SPE should be hardware- and data-conscious. Current SPEs use hardware resources inefficiently [Zeuch et al., Zhang et al.]
  7. 7. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Our Proposal Grizzly: Efficient Stream Processing Through Adaptive Query Compilation 7
  8. 8. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Grizzly’s Core Principles Order Preserving Task-based Parallelization Continuous Adaptive Optimizations 8 Query Compilation for Stream Processing
  9. 9. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Grizzly’s Core Principles Query Compilation for Stream Processing ● Fuses operators to compact code blocks. ● Support unique stream processing operators. 9 Order Preserving Task-based Parallelization Continuous Adaptive Optimizations
  10. 10. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Query Compilation 10 From(user_purchases ) .filter(origin=’Germany’) .keyBy(userid) .windowBy(TumblingWindow(days(7)), Max(price).as(max_price)) .filter(max_price > 42)
  11. 11. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Grizzly’s Core Principles Query Compilation for Stream Processing ● Fuses operators to compact code blocks. ● How to support combination of window assignment, function, and trigger? Order Preserving Task-based Parallelization ● Concurrent execution on a global state. ● Supporting order requirement of stream processing. ● Exploiting NUMA-configuration. 11
  12. 12. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Task-based Parallelization 12 ● Input stream is processed in small batches (sized to network buffer). ● Pipelines are executed concurrently on a shared state.
  13. 13. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Task-based Parallelization Lock-Free Window Processing ● Allows threads to process windows concurrently. ● Lightweight coordination for window triggering. NUMA-awareness ● Pre-aggregate window results on locally to minimize inter-NUMA node communication. 13
  14. 14. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Grizzly’s Core Principles Order Preserving Task-based Parallelization Continuous Adaptive Optimizations ● Feedback loop between code-generation and query execution. ● Lightweight monitoring at runtime. 14 Query Compilation for Stream Processing
  15. 15. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Adaptive Re-Optimization Generic Execution: ● Without data-dependent optimizations. 15 Instrumentalized Execution: ● Injects profiling code to collect statistics. (predicate selectivity, value distribution) Specialized Execution: ● Specialize operator implementation (predication, fixed hash-tables)
  16. 16. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Adaptive Optimization 16 Deoptimization: ● Migrates from optimized to less optimized execution. ● Caused by violated assumptions or changed data characteristics.
  17. 17. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Evaluation 17
  18. 18. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Grizzly outperforms state-of-the-art SPEs by up-to 10x. Evaluation: System Comparison 18
  19. 19. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Code generation is beneficial for a wide range of workloads. Evaluation: Workloads 19
  20. 20. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Evaluation: Adaptive Optimizations Adaptive optimizations are crucial to reach peak performance. 20
  21. 21. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Summary www.nebula.stream @NebulaStream Grizzly: ● Query compilation for stream processing. ● Task-based parallelization while taking ordering requirements into account. ● Adaptive optimization to reach to changing data characteristics. 21
  22. 22. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Query Compilation 22
  23. 23. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. System Architecture 23

×