SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Grizzly: Efficient Stream Processing
Through Adaptive Query Compilation
Philipp M. Grulich¹, Sebastian Breß², Steffen Zeuch¹², Jonas Traub¹,
Janis von Bleichert¹, Zongxiong Chen², Tilmann Rabl³, Volker Markl¹²
Technische Universität Berlin¹, DFKI GmbH², HPI & Universität Potsdam³
1
Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al.
Limitations of state-of-the-art SPEs
Current SPEs use hardware resources inefficiently [Zeuch et al., Zhang et al.]
2
Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al.
Limitations of state-of-the-art SPEs
3
1. Interpretation-based processing model causes poor cache utilization.
Current SPEs use hardware resources inefficiently [Zeuch et al., Zhang et al.]
Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al.
Limitations of state-of-the-art SPEs
4
1. Interpretation-based processing model causes poor cache utilization.
2. Upfront-Partitioning causes high overhead on single nodes.
Current SPEs use hardware resources inefficiently [Zeuch et al., Zhang et al.]
Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al.
Limitations of state-of-the-art SPEs
5
1. Interpretation-based processing model causes poor cache utilization.
2. Upfront-Partitioning causes high overhead on single nodes.
3. SPEs do not react to changing data-characteristics at runtime.
Current SPEs use hardware resources inefficiently [Zeuch et al., Zhang et al.]
Data Stream
Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al.
Limitations of state-of-the-art SPEs
6
1. Interpretation-based processing model causes poor cache utilization.
2. Upfront-Partitioning causes high overhead on single nodes.
3. SPEs do not react to changing data-characteristics.
An SPE should be hardware- and data-conscious.
Current SPEs use hardware resources inefficiently [Zeuch et al., Zhang et al.]
Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al.
Our Proposal
Grizzly: Efficient Stream Processing Through
Adaptive Query Compilation
7
Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al.
Grizzly’s Core Principles
Order Preserving
Task-based Parallelization
Continuous Adaptive
Optimizations
8
Query Compilation for
Stream Processing
Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al.
Grizzly’s Core Principles
Query Compilation for
Stream Processing
● Fuses operators to
compact code blocks.
● Support unique stream
processing operators.
9
Order Preserving
Task-based Parallelization
Continuous Adaptive
Optimizations
Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al.
Query Compilation
10
From(user_purchases )
.filter(origin=’Germany’)
.keyBy(userid)
.windowBy(TumblingWindow(days(7)), Max(price).as(max_price))
.filter(max_price > 42)
Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al.
Grizzly’s Core Principles
Query Compilation for
Stream Processing
● Fuses operators to
compact code blocks.
● How to support
combination of
window assignment,
function, and trigger?
Order Preserving
Task-based Parallelization
● Concurrent execution
on a global state.
● Supporting order
requirement of stream
processing.
● Exploiting
NUMA-configuration.
11
Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al.
Task-based Parallelization
12
● Input stream is processed in small batches (sized to network buffer).
● Pipelines are executed concurrently on a shared state.
Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al.
Task-based Parallelization
Lock-Free Window Processing
● Allows threads to process
windows concurrently.
● Lightweight coordination for
window triggering.
NUMA-awareness
● Pre-aggregate window results on
locally to minimize inter-NUMA
node communication.
13
Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al.
Grizzly’s Core Principles
Order Preserving
Task-based Parallelization
Continuous Adaptive
Optimizations
● Feedback loop between
code-generation and
query execution.
● Lightweight monitoring
at runtime.
14
Query Compilation for
Stream Processing
Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al.
Adaptive Re-Optimization
Generic Execution:
● Without data-dependent optimizations.
15
Instrumentalized Execution:
● Injects profiling code to collect statistics.
(predicate selectivity, value distribution)
Specialized Execution:
● Specialize operator implementation
(predication, fixed hash-tables)
Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al.
Adaptive Optimization
16
Deoptimization:
● Migrates from optimized to less optimized execution.
● Caused by violated assumptions or changed data characteristics.
Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al.
Evaluation
17
Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al.
Grizzly outperforms state-of-the-art SPEs by up-to 10x.
Evaluation: System Comparison
18
Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al.
Code generation is beneficial for a wide range of workloads.
Evaluation: Workloads
19
Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al.
Evaluation: Adaptive Optimizations
Adaptive optimizations are crucial to reach peak performance.
20
Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al.
Summary
www.nebula.stream
@NebulaStream
Grizzly:
● Query compilation for stream processing.
● Task-based parallelization while taking ordering
requirements into account.
● Adaptive optimization to reach to changing data
characteristics.
21
Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al.
Query Compilation
22
Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al.
System Architecture
23

Weitere ähnliche Inhalte

Ähnlich wie Grizzly: Efficient Stream Processing Through Adaptive Query Compilation

Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...VMware Tanzu
 
BlazingSQL & Graphistry - Netflow Demo
BlazingSQL & Graphistry - Netflow DemoBlazingSQL & Graphistry - Netflow Demo
BlazingSQL & Graphistry - Netflow DemoRodrigo Aramburu
 
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019VMware Tanzu
 
SigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt
 
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019VMware Tanzu
 
Tuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningTuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningSigOpt
 
GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017Joshua Patterson
 
Cloud-based dynamic distributed optimisation of integrated process planning a...
Cloud-based dynamic distributed optimisation of integrated process planning a...Cloud-based dynamic distributed optimisation of integrated process planning a...
Cloud-based dynamic distributed optimisation of integrated process planning a...Piotr Dziurzanski
 
How Data Volume Affects Spark Based Data Analytics on a Scale-up Server
How Data Volume Affects Spark Based Data Analytics on a Scale-up ServerHow Data Volume Affects Spark Based Data Analytics on a Scale-up Server
How Data Volume Affects Spark Based Data Analytics on a Scale-up ServerAhsan Javed Awan
 
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 DatasetGraph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 DatasetTigerGraph
 
DA 592 - Term Project Report - Berker Kozan Can Koklu
DA 592 - Term Project Report - Berker Kozan Can KokluDA 592 - Term Project Report - Berker Kozan Can Koklu
DA 592 - Term Project Report - Berker Kozan Can KokluCan Köklü
 
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to ProductionWebinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to Productioniguazio
 
AI Bridging Cloud Infrastructure (ABCI) and its communication performance
AI Bridging Cloud Infrastructure (ABCI) and its communication performanceAI Bridging Cloud Infrastructure (ABCI) and its communication performance
AI Bridging Cloud Infrastructure (ABCI) and its communication performanceinside-BigData.com
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce FrameworkBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce FrameworkMahantesh Angadi
 
Purpose-built NoSQL Database for IoT by Basavaraj Soppannavar
Purpose-built NoSQL Database for IoT by Basavaraj SoppannavarPurpose-built NoSQL Database for IoT by Basavaraj Soppannavar
Purpose-built NoSQL Database for IoT by Basavaraj SoppannavarData Con LA
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit pptSandeep Singh
 
SigOpt at GTC - Tuning the Untunable
SigOpt at GTC - Tuning the UntunableSigOpt at GTC - Tuning the Untunable
SigOpt at GTC - Tuning the UntunableSigOpt
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentationtestSri1
 

Ähnlich wie Grizzly: Efficient Stream Processing Through Adaptive Query Compilation (20)

Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
 
BlazingSQL & Graphistry - Netflow Demo
BlazingSQL & Graphistry - Netflow DemoBlazingSQL & Graphistry - Netflow Demo
BlazingSQL & Graphistry - Netflow Demo
 
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
 
SigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt for Machine Learning and AI
SigOpt for Machine Learning and AI
 
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
 
Tuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningTuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep Learning
 
GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017
 
Cloud-based dynamic distributed optimisation of integrated process planning a...
Cloud-based dynamic distributed optimisation of integrated process planning a...Cloud-based dynamic distributed optimisation of integrated process planning a...
Cloud-based dynamic distributed optimisation of integrated process planning a...
 
How Data Volume Affects Spark Based Data Analytics on a Scale-up Server
How Data Volume Affects Spark Based Data Analytics on a Scale-up ServerHow Data Volume Affects Spark Based Data Analytics on a Scale-up Server
How Data Volume Affects Spark Based Data Analytics on a Scale-up Server
 
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 DatasetGraph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
 
DA 592 - Term Project Report - Berker Kozan Can Koklu
DA 592 - Term Project Report - Berker Kozan Can KokluDA 592 - Term Project Report - Berker Kozan Can Koklu
DA 592 - Term Project Report - Berker Kozan Can Koklu
 
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to ProductionWebinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
 
AI Bridging Cloud Infrastructure (ABCI) and its communication performance
AI Bridging Cloud Infrastructure (ABCI) and its communication performanceAI Bridging Cloud Infrastructure (ABCI) and its communication performance
AI Bridging Cloud Infrastructure (ABCI) and its communication performance
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce FrameworkBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
 
Purpose-built NoSQL Database for IoT by Basavaraj Soppannavar
Purpose-built NoSQL Database for IoT by Basavaraj SoppannavarPurpose-built NoSQL Database for IoT by Basavaraj Soppannavar
Purpose-built NoSQL Database for IoT by Basavaraj Soppannavar
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
 
SigOpt at GTC - Tuning the Untunable
SigOpt at GTC - Tuning the UntunableSigOpt at GTC - Tuning the Untunable
SigOpt at GTC - Tuning the Untunable
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
 

Kürzlich hochgeladen

Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxtuking87
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsDobusch Leonhard
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx201bo007
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Christina Parmionova
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsDanielBaumann11
 
Measures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGMeasures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGSoniaBajaj10
 
Unveiling the Cannabis Plant’s Potential
Unveiling the Cannabis Plant’s PotentialUnveiling the Cannabis Plant’s Potential
Unveiling the Cannabis Plant’s PotentialMarkus Roggen
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterHanHyoKim
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxPayal Shrivastava
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11GelineAvendao
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxpriyankatabhane
 
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary Microbiology
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary MicrobiologyLAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary Microbiology
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary MicrobiologyChayanika Das
 
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...Chayanika Das
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPRPirithiRaju
 

Kürzlich hochgeladen (20)

Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and Pitfalls
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
 
Introduction Classification Of Alkaloids
Introduction Classification Of AlkaloidsIntroduction Classification Of Alkaloids
Introduction Classification Of Alkaloids
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
 
Measures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGMeasures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UG
 
Unveiling the Cannabis Plant’s Potential
Unveiling the Cannabis Plant’s PotentialUnveiling the Cannabis Plant’s Potential
Unveiling the Cannabis Plant’s Potential
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarter
 
PLASMODIUM. PPTX
PLASMODIUM. PPTXPLASMODIUM. PPTX
PLASMODIUM. PPTX
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptx
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptx
 
Interferons.pptx.
Interferons.pptx.Interferons.pptx.
Interferons.pptx.
 
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary Microbiology
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary MicrobiologyLAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary Microbiology
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary Microbiology
 
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
 

Grizzly: Efficient Stream Processing Through Adaptive Query Compilation

  • 1. Grizzly: Efficient Stream Processing Through Adaptive Query Compilation Philipp M. Grulich¹, Sebastian Breß², Steffen Zeuch¹², Jonas Traub¹, Janis von Bleichert¹, Zongxiong Chen², Tilmann Rabl³, Volker Markl¹² Technische Universität Berlin¹, DFKI GmbH², HPI & Universität Potsdam³ 1
  • 2. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Limitations of state-of-the-art SPEs Current SPEs use hardware resources inefficiently [Zeuch et al., Zhang et al.] 2
  • 3. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Limitations of state-of-the-art SPEs 3 1. Interpretation-based processing model causes poor cache utilization. Current SPEs use hardware resources inefficiently [Zeuch et al., Zhang et al.]
  • 4. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Limitations of state-of-the-art SPEs 4 1. Interpretation-based processing model causes poor cache utilization. 2. Upfront-Partitioning causes high overhead on single nodes. Current SPEs use hardware resources inefficiently [Zeuch et al., Zhang et al.]
  • 5. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Limitations of state-of-the-art SPEs 5 1. Interpretation-based processing model causes poor cache utilization. 2. Upfront-Partitioning causes high overhead on single nodes. 3. SPEs do not react to changing data-characteristics at runtime. Current SPEs use hardware resources inefficiently [Zeuch et al., Zhang et al.] Data Stream
  • 6. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Limitations of state-of-the-art SPEs 6 1. Interpretation-based processing model causes poor cache utilization. 2. Upfront-Partitioning causes high overhead on single nodes. 3. SPEs do not react to changing data-characteristics. An SPE should be hardware- and data-conscious. Current SPEs use hardware resources inefficiently [Zeuch et al., Zhang et al.]
  • 7. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Our Proposal Grizzly: Efficient Stream Processing Through Adaptive Query Compilation 7
  • 8. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Grizzly’s Core Principles Order Preserving Task-based Parallelization Continuous Adaptive Optimizations 8 Query Compilation for Stream Processing
  • 9. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Grizzly’s Core Principles Query Compilation for Stream Processing ● Fuses operators to compact code blocks. ● Support unique stream processing operators. 9 Order Preserving Task-based Parallelization Continuous Adaptive Optimizations
  • 10. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Query Compilation 10 From(user_purchases ) .filter(origin=’Germany’) .keyBy(userid) .windowBy(TumblingWindow(days(7)), Max(price).as(max_price)) .filter(max_price > 42)
  • 11. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Grizzly’s Core Principles Query Compilation for Stream Processing ● Fuses operators to compact code blocks. ● How to support combination of window assignment, function, and trigger? Order Preserving Task-based Parallelization ● Concurrent execution on a global state. ● Supporting order requirement of stream processing. ● Exploiting NUMA-configuration. 11
  • 12. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Task-based Parallelization 12 ● Input stream is processed in small batches (sized to network buffer). ● Pipelines are executed concurrently on a shared state.
  • 13. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Task-based Parallelization Lock-Free Window Processing ● Allows threads to process windows concurrently. ● Lightweight coordination for window triggering. NUMA-awareness ● Pre-aggregate window results on locally to minimize inter-NUMA node communication. 13
  • 14. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Grizzly’s Core Principles Order Preserving Task-based Parallelization Continuous Adaptive Optimizations ● Feedback loop between code-generation and query execution. ● Lightweight monitoring at runtime. 14 Query Compilation for Stream Processing
  • 15. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Adaptive Re-Optimization Generic Execution: ● Without data-dependent optimizations. 15 Instrumentalized Execution: ● Injects profiling code to collect statistics. (predicate selectivity, value distribution) Specialized Execution: ● Specialize operator implementation (predication, fixed hash-tables)
  • 16. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Adaptive Optimization 16 Deoptimization: ● Migrates from optimized to less optimized execution. ● Caused by violated assumptions or changed data characteristics.
  • 17. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Evaluation 17
  • 18. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Grizzly outperforms state-of-the-art SPEs by up-to 10x. Evaluation: System Comparison 18
  • 19. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Code generation is beneficial for a wide range of workloads. Evaluation: Workloads 19
  • 20. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Evaluation: Adaptive Optimizations Adaptive optimizations are crucial to reach peak performance. 20
  • 21. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Summary www.nebula.stream @NebulaStream Grizzly: ● Query compilation for stream processing. ● Task-based parallelization while taking ordering requirements into account. ● Adaptive optimization to reach to changing data characteristics. 21
  • 22. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Query Compilation 22
  • 23. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. System Architecture 23