SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of
Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018
1
Scalable Detection of Concept Drifts
on Data Streams with
Parallel Adaptive Windowing
Presentation at 18.04.2018
Philipp M. Grulich, René Saitenmacher, Jonas Traub, Sebastian Breß, Tilmann Rabl, Volker Markl
EDBT, Vienna, 2018
P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of
Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018
2
Background:
Adaptive Windowing (ADWIN) [Bifet ’07]
• Maintain an adaptive window which is the basis for computing a ML model.
• Grow the window as long as there is no concept drift detected.
• As a result, the model can rely on growing training data.
P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of
Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018
3
Background:
Adaptive Windowing (ADWIN) [Bifet ’07]
• Maintain an adaptive window which is the basis for computing a ML model.
• Grow the window as long as there is no concept drift detected.
• As a result, the model can rely on growing training data.
P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of
Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018
4
Background:
Adaptive Windowing (ADWIN) [Bifet ’07]
P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of
Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018
5
Background:
Adaptive Windowing (ADWIN) [Bifet ’07]
P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of
Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018
6
Initial Performance Study:
Where do we Spend Execution Time?
P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of
Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018
7
Initial Performance Study:
Where do we Spend Execution Time?
 Parallelize Cut Detection!
P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of
Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018
8
Scope of Parallelization
Cut-Check
Decoupling
• Cut Checks do
not block
processing.
INTRA Cut-Check
Parallelization
• Parallelize the
execution of
one iteration.
INTER Cut-Check
Parallelization
• Run multiple
iterations
concurrently.
P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of
Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018
9
INTRA Cut Check Parallelization
• Halve the latency / execution time of the cut detection.
• Introducing a second thread which moves opposite to the first
thread and performs half of the cut checks.
• Can be extended to more threads.
P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of
Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018
10
INTRA Cut Check Parallelization
• Decouple cut detection from the insertion of stream tuples.
• Execute cut detection on a snapshot histogram.
• Use multi-version concurrency control (MVCC).
• roll back in case of concept drifts.
P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of
Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018
11
Experiment Results
1) Optimistic Adwin is two
orders of magnitude faster
than the original Adwin
implementation and 54
times faster than an
optimized sequential
implementation.
2) Half-Cut Adwin is 10
times faster than the
original Adwin
implementation and
almost 2 times faster than
an optimized sequential
implementation.
3) Both have a low
microseconds latency.
P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of
Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018
12
Open Source Repositories
• Parallel Adwin Open Source Repository
– github.com/TU-Berlin-DIMA/parallel-ADWIN
• Original Adwin Repository
– github.com/abifet/adwin

Weitere ähnliche Inhalte

Was ist angesagt?

AlexNet, VGG, GoogleNet, Resnet
AlexNet, VGG, GoogleNet, ResnetAlexNet, VGG, GoogleNet, Resnet
AlexNet, VGG, GoogleNet, ResnetJungwon Kim
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingTed Xiao
 
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...Praxitelis Nikolaos Kouroupetroglou
 
Spectral clustering
Spectral clusteringSpectral clustering
Spectral clusteringSOYEON KIM
 
Logistic regression : Use Case | Background | Advantages | Disadvantages
Logistic regression : Use Case | Background | Advantages | DisadvantagesLogistic regression : Use Case | Background | Advantages | Disadvantages
Logistic regression : Use Case | Background | Advantages | DisadvantagesRajat Sharma
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.pptbutest
 
Time Series - Auto Regressive Models
Time Series - Auto Regressive ModelsTime Series - Auto Regressive Models
Time Series - Auto Regressive ModelsBhaskar T
 
Sentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics PlatformSentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics PlatformKNIMESlides
 
Quantile Quantile Plot qq plot
Quantile Quantile Plot qq plot  Quantile Quantile Plot qq plot
Quantile Quantile Plot qq plot Saeed Siddik
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionMartinHogg9
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsClaudia Wagner
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdftaeseon ryu
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine LearningKnoldus Inc.
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningHML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningYan Xu
 
5.5 graph mining
5.5 graph mining5.5 graph mining
5.5 graph miningKrish_ver2
 
Fraud detection with Machine Learning
Fraud detection with Machine LearningFraud detection with Machine Learning
Fraud detection with Machine LearningScaleway
 

Was ist angesagt? (20)

AlexNet, VGG, GoogleNet, Resnet
AlexNet, VGG, GoogleNet, ResnetAlexNet, VGG, GoogleNet, Resnet
AlexNet, VGG, GoogleNet, Resnet
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
 
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
 
Spectral clustering
Spectral clusteringSpectral clustering
Spectral clustering
 
Logistic regression : Use Case | Background | Advantages | Disadvantages
Logistic regression : Use Case | Background | Advantages | DisadvantagesLogistic regression : Use Case | Background | Advantages | Disadvantages
Logistic regression : Use Case | Background | Advantages | Disadvantages
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
Time Series - Auto Regressive Models
Time Series - Auto Regressive ModelsTime Series - Auto Regressive Models
Time Series - Auto Regressive Models
 
Sentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics PlatformSentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics Platform
 
Quantile Quantile Plot qq plot
Quantile Quantile Plot qq plot  Quantile Quantile Plot qq plot
Quantile Quantile Plot qq plot
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic Models
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Lecture6 - C4.5
Lecture6 - C4.5Lecture6 - C4.5
Lecture6 - C4.5
 
HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningHML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep Learning
 
5.5 graph mining
5.5 graph mining5.5 graph mining
5.5 graph mining
 
Fraud detection with Machine Learning
Fraud detection with Machine LearningFraud detection with Machine Learning
Fraud detection with Machine Learning
 

Ähnlich wie Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing

Scotty: Efficient Window Aggregation with General Stream Slicing - jonas Trau...
Scotty: Efficient Window Aggregation with General Stream Slicing - jonas Trau...Scotty: Efficient Window Aggregation with General Stream Slicing - jonas Trau...
Scotty: Efficient Window Aggregation with General Stream Slicing - jonas Trau...Flink Forward
 
FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...
FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...
FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...Jonas Traub
 
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...Jonas Traub
 
Project presentation - Steganographic Application of improved Genetic Shifti...
Project presentation  - Steganographic Application of improved Genetic Shifti...Project presentation  - Steganographic Application of improved Genetic Shifti...
Project presentation - Steganographic Application of improved Genetic Shifti...Vladislav Kaplan
 
REAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENT
REAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENTREAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENT
REAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENTcscpconf
 
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment csandit
 
A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...
A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...
A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...IRJET Journal
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxPierre Schaus
 
Event-Driven, Client-Server Archetypes for E-Commerce
Event-Driven, Client-Server Archetypes for E-CommerceEvent-Driven, Client-Server Archetypes for E-Commerce
Event-Driven, Client-Server Archetypes for E-Commerceijtsrd
 
José Andrés Muñoz Arcentales - DataToWisdom_TinyML_ Machine Learning in the ...
José Andrés Muñoz Arcentales - DataToWisdom_TinyML_ Machine Learning  in the ...José Andrés Muñoz Arcentales - DataToWisdom_TinyML_ Machine Learning  in the ...
José Andrés Muñoz Arcentales - DataToWisdom_TinyML_ Machine Learning in the ...FIWARE
 
Plume - A Code Property Graph Extraction and Analysis Library
Plume - A Code Property Graph Extraction and Analysis LibraryPlume - A Code Property Graph Extraction and Analysis Library
Plume - A Code Property Graph Extraction and Analysis LibraryTigerGraph
 
Extended summary of "Cloudy with a chance of short RTTs Analyzing Cloud Conne...
Extended summary of "Cloudy with a chance of short RTTs Analyzing Cloud Conne...Extended summary of "Cloudy with a chance of short RTTs Analyzing Cloud Conne...
Extended summary of "Cloudy with a chance of short RTTs Analyzing Cloud Conne...IsabellaFilippo
 
GaiaCal2014: Creating and Calibrating LSST Data Product
GaiaCal2014: Creating and Calibrating LSST Data ProductGaiaCal2014: Creating and Calibrating LSST Data Product
GaiaCal2014: Creating and Calibrating LSST Data ProductMario Juric
 
Db2425082511
Db2425082511Db2425082511
Db2425082511IJMER
 
IRJET- Secure Data Access on Distributed Database using Skyline Queries
IRJET- Secure Data Access on Distributed Database using Skyline QueriesIRJET- Secure Data Access on Distributed Database using Skyline Queries
IRJET- Secure Data Access on Distributed Database using Skyline QueriesIRJET Journal
 
New information sources for rain fields
New information sources for rain fieldsNew information sources for rain fields
New information sources for rain fieldsAndreas Scheidegger
 

Ähnlich wie Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing (20)

Scotty: Efficient Window Aggregation with General Stream Slicing - jonas Trau...
Scotty: Efficient Window Aggregation with General Stream Slicing - jonas Trau...Scotty: Efficient Window Aggregation with General Stream Slicing - jonas Trau...
Scotty: Efficient Window Aggregation with General Stream Slicing - jonas Trau...
 
FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...
FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...
FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...
 
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
 
Project presentation - Steganographic Application of improved Genetic Shifti...
Project presentation  - Steganographic Application of improved Genetic Shifti...Project presentation  - Steganographic Application of improved Genetic Shifti...
Project presentation - Steganographic Application of improved Genetic Shifti...
 
XeMPUPiL @ NGCLE@e-Novia 15.11.2017
XeMPUPiL @ NGCLE@e-Novia 15.11.2017XeMPUPiL @ NGCLE@e-Novia 15.11.2017
XeMPUPiL @ NGCLE@e-Novia 15.11.2017
 
An Optics Life
An Optics LifeAn Optics Life
An Optics Life
 
REAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENT
REAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENTREAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENT
REAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENT
 
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment
 
A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...
A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...
A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- Redux
 
VILLAS Concept
VILLAS ConceptVILLAS Concept
VILLAS Concept
 
rooter.pdf
rooter.pdfrooter.pdf
rooter.pdf
 
Event-Driven, Client-Server Archetypes for E-Commerce
Event-Driven, Client-Server Archetypes for E-CommerceEvent-Driven, Client-Server Archetypes for E-Commerce
Event-Driven, Client-Server Archetypes for E-Commerce
 
José Andrés Muñoz Arcentales - DataToWisdom_TinyML_ Machine Learning in the ...
José Andrés Muñoz Arcentales - DataToWisdom_TinyML_ Machine Learning  in the ...José Andrés Muñoz Arcentales - DataToWisdom_TinyML_ Machine Learning  in the ...
José Andrés Muñoz Arcentales - DataToWisdom_TinyML_ Machine Learning in the ...
 
Plume - A Code Property Graph Extraction and Analysis Library
Plume - A Code Property Graph Extraction and Analysis LibraryPlume - A Code Property Graph Extraction and Analysis Library
Plume - A Code Property Graph Extraction and Analysis Library
 
Extended summary of "Cloudy with a chance of short RTTs Analyzing Cloud Conne...
Extended summary of "Cloudy with a chance of short RTTs Analyzing Cloud Conne...Extended summary of "Cloudy with a chance of short RTTs Analyzing Cloud Conne...
Extended summary of "Cloudy with a chance of short RTTs Analyzing Cloud Conne...
 
GaiaCal2014: Creating and Calibrating LSST Data Product
GaiaCal2014: Creating and Calibrating LSST Data ProductGaiaCal2014: Creating and Calibrating LSST Data Product
GaiaCal2014: Creating and Calibrating LSST Data Product
 
Db2425082511
Db2425082511Db2425082511
Db2425082511
 
IRJET- Secure Data Access on Distributed Database using Skyline Queries
IRJET- Secure Data Access on Distributed Database using Skyline QueriesIRJET- Secure Data Access on Distributed Database using Skyline Queries
IRJET- Secure Data Access on Distributed Database using Skyline Queries
 
New information sources for rain fields
New information sources for rain fieldsNew information sources for rain fields
New information sources for rain fields
 

Mehr von Jonas Traub

Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...
Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...
Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...Jonas Traub
 
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...Jonas Traub
 
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...Jonas Traub
 
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019Jonas Traub
 
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)Jonas Traub
 
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...Jonas Traub
 
Flink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream SlicingFlink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream SlicingJonas Traub
 
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream ProcessingScotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream ProcessingJonas Traub
 
Efficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCLEfficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCLJonas Traub
 
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...Jonas Traub
 
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...Jonas Traub
 
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...Jonas Traub
 
I²: Interactive Real-Time Visualization for Streaming Data
I²: Interactive Real-Time Visualization for Streaming DataI²: Interactive Real-Time Visualization for Streaming Data
I²: Interactive Real-Time Visualization for Streaming DataJonas Traub
 
LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)Jonas Traub
 
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream AnalysisLWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream AnalysisJonas Traub
 

Mehr von Jonas Traub (15)

Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...
Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...
Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...
 
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
 
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...
 
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
 
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
 
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
 
Flink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream SlicingFlink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream Slicing
 
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream ProcessingScotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
 
Efficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCLEfficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCL
 
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
 
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
 
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
 
I²: Interactive Real-Time Visualization for Streaming Data
I²: Interactive Real-Time Visualization for Streaming DataI²: Interactive Real-Time Visualization for Streaming Data
I²: Interactive Real-Time Visualization for Streaming Data
 
LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)
 
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream AnalysisLWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
 

Kürzlich hochgeladen

Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 

Kürzlich hochgeladen (20)

Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 

Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing

  • 1. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 1 Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing Presentation at 18.04.2018 Philipp M. Grulich, René Saitenmacher, Jonas Traub, Sebastian Breß, Tilmann Rabl, Volker Markl EDBT, Vienna, 2018
  • 2. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 2 Background: Adaptive Windowing (ADWIN) [Bifet ’07] • Maintain an adaptive window which is the basis for computing a ML model. • Grow the window as long as there is no concept drift detected. • As a result, the model can rely on growing training data.
  • 3. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 3 Background: Adaptive Windowing (ADWIN) [Bifet ’07] • Maintain an adaptive window which is the basis for computing a ML model. • Grow the window as long as there is no concept drift detected. • As a result, the model can rely on growing training data.
  • 4. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 4 Background: Adaptive Windowing (ADWIN) [Bifet ’07]
  • 5. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 5 Background: Adaptive Windowing (ADWIN) [Bifet ’07]
  • 6. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 6 Initial Performance Study: Where do we Spend Execution Time?
  • 7. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 7 Initial Performance Study: Where do we Spend Execution Time?  Parallelize Cut Detection!
  • 8. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 8 Scope of Parallelization Cut-Check Decoupling • Cut Checks do not block processing. INTRA Cut-Check Parallelization • Parallelize the execution of one iteration. INTER Cut-Check Parallelization • Run multiple iterations concurrently.
  • 9. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 9 INTRA Cut Check Parallelization • Halve the latency / execution time of the cut detection. • Introducing a second thread which moves opposite to the first thread and performs half of the cut checks. • Can be extended to more threads.
  • 10. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 10 INTRA Cut Check Parallelization • Decouple cut detection from the insertion of stream tuples. • Execute cut detection on a snapshot histogram. • Use multi-version concurrency control (MVCC). • roll back in case of concept drifts.
  • 11. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 11 Experiment Results 1) Optimistic Adwin is two orders of magnitude faster than the original Adwin implementation and 54 times faster than an optimized sequential implementation. 2) Half-Cut Adwin is 10 times faster than the original Adwin implementation and almost 2 times faster than an optimized sequential implementation. 3) Both have a low microseconds latency.
  • 12. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 12 Open Source Repositories • Parallel Adwin Open Source Repository – github.com/TU-Berlin-DIMA/parallel-ADWIN • Original Adwin Repository – github.com/abifet/adwin