SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Data Streaming in
Big Data Analysis
Docent lecture
Vincenzo Gulisano, Ph.D.
Vincenzo Gulisano Data streaming in Big Data analysis 1
Vincenzo Gulisano Data streaming in Big Data analysis 2
Agenda
•Why data streaming?
•How does it work?
•Past, current and future research
•Conclusions
•Bibliography
Vincenzo Gulisano Data streaming in Big Data analysis 3
Agenda
•Why data streaming?
•How does it work?
•Past, current and future research
•Conclusions
•Bibliography
Vincenzo Gulisano Data streaming in Big Data analysis 4
... back in the year 2000
• Continuous processing of data
streams
• Real-time fashion
... Store-then-process is not feasible
Vincenzo Gulisano Data streaming in Big Data analysis 5
Financial applications
Sensor networks
ISPs
~ 2010
Vincenzo Gulisano Data streaming in Big Data analysis 6
2017
Vincenzo Gulisano Data streaming in Big Data analysis 7
Advanced Metering Infrastructures
Vehicular Networks
1. Billions of readings per day cannot be
transferred continuously
2. The latency incurred while transferring data
might undermine the utility of the analysis
3. It is not secure to concentrate all the data in
a single place
4. Privacy can be leaked when giving away
fine-grained data
What do we need then?
• Efficient one-pass analysis
• In memory
• Bounded resources
Agenda
•Why data streaming?
•How does it work?
•Past, current and future research
•Conclusions
•Bibliography
Vincenzo Gulisano Data streaming in Big Data analysis 8
Main Memory
Disk
1 Data
Query Processing
3 Query
results
2 Query
Main Memory
Query Processing
Continuous
Query
Data
Query
results
9Data streaming in Big Data analysisVincenzo Gulisano
DBMS vs. DSMS
data stream: unbounded sequence of tuples sharing the same schema
10
Example: vehicles’ speed and position reports
time
Field Field
vehicle id text
time (secs) text
speed (Km/h) double
X coordinate double
Y coordinate double
A 8:00 55.5 X1 Y1 A 8:07 34.3 X3 Y3
A 8:03 70.3 X2 Y2
Data streaming in Big Data analysisVincenzo Gulisano
continuous query: Directed Acyclic Graph (DAG) of streams and operators
11
OP
OP
OP
OP OP
OP OP
source op
(1+ out streams)
sink op
(1+ in streams)
stream
op
(1+ in, 1+ out streams)
Data streaming in Big Data analysisVincenzo Gulisano
data streaming operators
• Stateless operators
• do not maintain any state
• one-by-one processing
• Stateful operators
• maintain a state that evolves with the tuples being processed
• produce output tuples that depend on multiple input tuples
12
OP
OP
Data streaming in Big Data analysisVincenzo Gulisano
stateless operators
13
Filter
...
Map
Union
...
Filter / route tuples based on one (or more) conditions
Transform each tuple
Merge multiple streams (with the same schema) into one
Data streaming in Big Data analysisVincenzo Gulisano
stateful operators
14
Aggregate information from multiple tuples
(e.g., compute average speed of the tuples in the last hour)
Compare tuples coming from 2 streams given a certain predicate
(e.g., given the last 5 tuples from each stream, join every pair
reporting the same position)
Aggregate
Join
Data streaming in Big Data analysisVincenzo Gulisano
Since streams are unbounded, windows (over time or tuples)
are defined to bound the portion of tuples to aggregate or join
sample query
For each vehicle, raise an alert if the speed of the latest report is more
than 2 times higher than its average speed in the last 30 days.
15
time
A 8:00 55.5 X1 Y1 A 8:07 34.3 X3 Y3
A 8:03 70.3 X2 Y2
Data streaming in Big Data analysisVincenzo Gulisano
16
Field
vehicle id
time (secs)
speed (Km/h)
X coordinate
Y coordinate
Compute average
speed for each
vehicle during the
last 30 days
Aggregate
Field
vehicle id
time (secs)
avg speed (Km/h)
Join
Check
condition
Filter
Field
vehicle id
time (secs)
speed (Km/h)
Join on
vehicle id
Field
vehicle id
time (secs)
avg speed (Km/h)
speed (Km/h)
sample query
Data streaming in Big Data analysisVincenzo Gulisano
A B C
A B C
A B C
A B C
B
C
A
A
A
B
B
Vincenzo Gulisano Data streaming in Big Data analysis 17
Agenda
•Why data streaming?
•How does it work?
•Past, current and future research
•Conclusions
•Bibliography
Vincenzo Gulisano Data streaming in Big Data analysis 18
Faulttolerance
Elasticity
Loadbalancing
Determinism
Parallel execution of streaming operators
Vincenzo Gulisano Data streaming in Big Data analysis 19
Parallel execution of streaming applications
OP1 OP2
OP1 OP2
OP1 OP2
=
1) how to route tuples?
2) where to route tuples?
3) How to merge tuples?
4) How many instances to
deploy per operator?
...
Faulttolerance
Elasticity
Loadbalancing
Determinism
Parallel execution of streaming operators
Vincenzo Gulisano Data streaming in Big Data analysis 20
Parallel execution of streaming applications
DDoSdetection
andmitigation
Intrusiondetection
Datavalidation
Differentially
privateaggregation
Vehicularnetworks
analysis
Urbanmobility
analysis
Security and
privacy
IoT
Transportation
sustainability
Synchronization / Data structures
Faulttolerance
Elasticity
Loadbalancing
Determinism
Parallel execution of streaming operators
Vincenzo Gulisano Data streaming in Big Data analysis 21
Parallel execution of streaming applications
Many-core systems / FPGAs
DDoSdetection
andmitigation
Intrusiondetection
Datavalidation
Differentially
privateaggregation
Vehicularnetworks
analysis
Urbanmobility
analysis
Parallel joins
Parallel
aggregates
Joins
modeling
Security and
privacy
IoT
Transportation
sustainability
Vincenzo Gulisano Data streaming in Big Data analysis 22
Synchronization / Data structures
Faulttolerance
Elasticity
Loadbalancing
DeterminismParallel execution of streaming operators
Parallel execution of streaming applications
Many-core systems / FPGAs
DDoSdetection
andmitigation
Intrusiondetection
Datavalidation
Differentially
privateaggregation
Vehicularnetworks
analysis
Urbanmobility
analysis
Security and
privacy
Transportatio
sustainabilit
Distributed execution of streaming
applications
first the hardware, then the query
first the query, then the hardware
Agenda
•Why data streaming?
•How does it work?
•Past, current and future research
•Conclusions
•Bibliography
Vincenzo Gulisano Data streaming in Big Data analysis 23
Millions of
sensors
• Store information
• Iterate multiple times over data
• Think, do not rush through decisions
• ”Hard-wired” routines
• Real-time decisions
• High-throughput / low-latency
Do I really need to
try surströmming?
(... NO)
Danger!!!
Run!!!
(surströmming
can opened)
Vincenzo Gulisano Data streaming in Big Data analysis 24
What traffic
congestion
patterns can I
observe
frequently?
Don’t take
over, car in
opposite lane!
Vincenzo Gulisano Data streaming in Big Data analysis 25
• Store information
• Iterate multiple times over data
• Think, do not rush through decisions
• ”Hard-wired” routines
• Real-time decisions
• High-throughput / low-latency
Millions of
sensors
+ + +
Agenda
•Why data streaming?
•How does it work?
•Past, current and future research
•Conclusions
•Bibliography
Vincenzo Gulisano Data streaming in Big Data analysis 26
Bibliography
1. Zhou, Jiazhen, Rose Qingyang Hu, and Yi Qian. "Scalable distributed communication architectures to support advanced
metering infrastructure in smart grid." IEEE Transactions on Parallel and Distributed Systems 23.9 (2012): 1632-1642.
2. Gulisano, Vincenzo, et al. "BES: Differentially Private and Distributed Event Aggregation in Advanced Metering Infrastructures."
Proceedings of the 2nd ACM International Workshop on Cyber-Physical System Security. ACM, 2016.
3. Gulisano, Vincenzo, Magnus Almgren, and Marina Papatriantafilou. "Online and scalable data validation in advanced metering
infrastructures." IEEE PES Innovative Smart Grid Technologies, Europe. IEEE, 2014.
4. Gulisano, Vincenzo, Magnus Almgren, and Marina Papatriantafilou. "METIS: a two-tier intrusion detection system for advanced
metering infrastructures." International Conference on Security and Privacy in Communication Systems. Springer International
Publishing, 2014.
5. Yousefi, Saleh, Mahmoud Siadat Mousavi, and Mahmood Fathy. "Vehicular ad hoc networks (VANETs): challenges and
perspectives." 2006 6th International Conference on ITS Telecommunications. IEEE, 2006.
6. El Zarki, Magda, et al. "Security issues in a future vehicular network." European Wireless. Vol. 2. 2002.
7. Georgiadis, Giorgos, and Marina Papatriantafilou. "Dealing with storage without forecasts in smart grids: Problem
transformation and online scheduling algorithm." Proceedings of the 29th Annual ACM Symposium on Applied Computing.
ACM, 2014.
8. Fu, Zhang, et al. "Online temporal-spatial analysis for detection of critical events in Cyber-Physical Systems." Big Data (Big
Data), 2014 IEEE International Conference on. IEEE, 2014.
Data streaming in Big Data analysis 27Vincenzo Gulisano
Bibliography
9. Arasu, Arvind, et al. "Linear road: a stream data management benchmark." Proceedings of the Thirtieth international
conference on Very large data bases-Volume 30. VLDB Endowment, 2004.
10. Lv, Yisheng, et al. "Traffic flow prediction with big data: a deep learning approach." IEEE Transactions on Intelligent
Transportation Systems 16.2 (2015): 865-873.
11. Grochocki, David, et al. "AMI threats, intrusion detection requirements and deployment recommendations." Smart Grid
Communications (SmartGridComm), 2012 IEEE Third International Conference on. IEEE, 2012.
12. Molina-Markham, Andrés, et al. "Private memoirs of a smart meter." Proceedings of the 2nd ACM workshop on embedded
sensing systems for energy-efficiency in building. ACM, 2010.
13. Gulisano, Vincenzo, et al. "Streamcloud: A large scale data streaming system." Distributed Computing Systems (ICDCS), 2010
IEEE 30th International Conference on. IEEE, 2010.
14. Stonebraker, Michael, Uǧur Çetintemel, and Stan Zdonik. "The 8 requirements of real-time stream processing." ACM SIGMOD
Record 34.4 (2005): 42-47.
15. Bonomi, Flavio, et al. "Fog computing and its role in the internet of things." Proceedings of the first edition of the MCC
workshop on Mobile cloud computing. ACM, 2012.
Data streaming in Big Data analysis 28Vincenzo Gulisano
Bibliography
16. Gulisano, Vincenzo Massimiliano. StreamCloud: An Elastic Parallel-Distributed Stream Processing Engine. Diss. Informatica,
2012.
17. Cardellini, Valeria, et al. "Optimal operator placement for distributed stream processing applications." Proceedings of the 10th
ACM International Conference on Distributed and Event-based Systems. ACM, 2016.
18. Costache, Stefania, et al. "Understanding the Data-Processing Challenges in Intelligent Vehicular Systems." Proceedings of the
2016 IEEE Intelligent Vehicles Symposium (IV16).
19. Giatrakos, Nikos, Antonios Deligiannakis, and Minos Garofalakis. "Scalable Approximate Query Tracking over Highly Distributed
Data Streams." Proceedings of the 2016 International Conference on Management of Data. ACM, 2016.
20. Gulisano, Vincenzo, et al. "Streamcloud: An elastic and scalable data streaming system." IEEE Transactions on Parallel and
Distributed Systems 23.12 (2012): 2351-2365.
21. Shah, Mehul A., et al. "Flux: An adaptive partitioning operator for continuous query systems." Data Engineering, 2003.
Proceedings. 19th International Conference on. IEEE, 2003.
Data streaming in Big Data analysis 29Vincenzo Gulisano
Bibliography
22. Cederman, Daniel, et al. "Brief announcement: concurrent data structures for efficient streaming aggregation." Proceedings of
the 26th ACM symposium on Parallelism in algorithms and architectures. ACM, 2014.
23. Ji, Yuanzhen, et al. "Quality-driven processing of sliding window aggregates over out-of-order data streams." Proceedings of
the 9th ACM International Conference on Distributed Event-Based Systems. ACM, 2015.
24. Ji, Yuanzhen, et al. "Quality-driven disorder handling for concurrent windowed stream queries with shared operators."
Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems. ACM, 2016.
25. Gulisano, Vincenzo, et al. "Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join." Big Data (Big Data), 2015
IEEE International Conference on. IEEE, 2015.
26. Ottenwälder, Beate, et al. "MigCEP: operator migration for mobility driven distributed complex event processing." Proceedings
of the 7th ACM international conference on Distributed event-based systems. ACM, 2013.
27. De Matteis, Tiziano, and Gabriele Mencagli. "Keep calm and react with foresight: strategies for low-latency and energy-efficient
elastic data stream processing." Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel
Programming. ACM, 2016.
28. Balazinska, Magdalena, et al. "Fault-tolerance in the Borealis distributed stream processing system." ACM Transactions on
Database Systems (TODS) 33.1 (2008): 3.
29. Castro Fernandez, Raul, et al. "Integrating scale out and fault tolerance in stream processing using operator state
management." Proceedings of the 2013 ACM SIGMOD international conference on Management of data. ACM, 2013.
Data streaming in Big Data analysis 30Vincenzo Gulisano
Bibliography
30. Dwork, Cynthia. "Differential privacy: A survey of results." International Conference on Theory and Applications of Models of
Computation. Springer Berlin Heidelberg, 2008.
31. Dwork, Cynthia, et al. "Differential privacy under continual observation." Proceedings of the forty-second ACM symposium on
Theory of computing. ACM, 2010.
32. Kargl, Frank, Arik Friedman, and Roksana Boreli. "Differential privacy in intelligent transportation systems." Proceedings of the
sixth ACM conference on Security and privacy in wireless and mobile networks. ACM, 2013.
Data streaming in Big Data analysis 31Vincenzo Gulisano

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and UsesSuvradeep Rudra
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map ReduceApache Apex
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 
Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014Philip Fisher-Ogden
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to CassandraGokhan Atil
 
Components of a Data-Warehouse
Components of a Data-WarehouseComponents of a Data-Warehouse
Components of a Data-WarehouseAbdul Aslam
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics ArchitectureArvind Sathi
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 

Was ist angesagt? (20)

HDFS Architecture
HDFS ArchitectureHDFS Architecture
HDFS Architecture
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Components of a Data-Warehouse
Components of a Data-WarehouseComponents of a Data-Warehouse
Components of a Data-Warehouse
 
Big data architecture
Big data architectureBig data architecture
Big data architecture
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Mongo db intro.pptx
Mongo db intro.pptxMongo db intro.pptx
Mongo db intro.pptx
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 

Ähnlich wie Data Streaming in Big Data Analysis

Data Streaming in IoT and Big Data Analytics
Data Streaming in  IoT and Big Data AnalyticsData Streaming in  IoT and Big Data Analytics
Data Streaming in IoT and Big Data AnalyticsVincenzo Gulisano
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesVincenzo Gulisano
 
Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsVincenzo Gulisano
 
Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...
Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...
Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...Demetris Trihinas
 
Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things PayamBarnaghi
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming AnalyticsGuido Schmutz
 
Smart Cities: How are they different?
Smart Cities: How are they different? Smart Cities: How are they different?
Smart Cities: How are they different? PayamBarnaghi
 
Big Data - Umesh Bellur
Big Data - Umesh BellurBig Data - Umesh Bellur
Big Data - Umesh BellurSTS FORUM 2016
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataStavros Kontopoulos
 
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thessaloniki
 
Realtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in HighwaysRealtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in HighwaysYork University
 
Dynamic Data Center concept
Dynamic Data Center concept  Dynamic Data Center concept
Dynamic Data Center concept Miha Ahronovitz
 
Big Data & Smart City Applications
Big Data & Smart City ApplicationsBig Data & Smart City Applications
Big Data & Smart City ApplicationsAmit Sheth
 
Semantic Sensor Networks and Linked Stream Data
Semantic Sensor Networks and Linked Stream DataSemantic Sensor Networks and Linked Stream Data
Semantic Sensor Networks and Linked Stream DataOscar Corcho
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Raja Chiky
 
Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Boris Adryan
 
Physical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City ApplicationsPhysical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City ApplicationsPayamBarnaghi
 

Ähnlich wie Data Streaming in Big Data Analysis (20)

Data Streaming in IoT and Big Data Analytics
Data Streaming in  IoT and Big Data AnalyticsData Streaming in  IoT and Big Data Analytics
Data Streaming in IoT and Big Data Analytics
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architectures
 
Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operations
 
CINET: A Cyber-Infrastructure for Network Science Overview
CINET: A Cyber-Infrastructure for Network Science OverviewCINET: A Cyber-Infrastructure for Network Science Overview
CINET: A Cyber-Infrastructure for Network Science Overview
 
Stream Processing
Stream Processing Stream Processing
Stream Processing
 
Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...
Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...
Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...
 
Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
Smart Cities: How are they different?
Smart Cities: How are they different? Smart Cities: How are they different?
Smart Cities: How are they different?
 
Big Data - Umesh Bellur
Big Data - Umesh BellurBig Data - Umesh Bellur
Big Data - Umesh Bellur
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
 
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
 
Realtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in HighwaysRealtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in Highways
 
Dynamic Data Center concept
Dynamic Data Center concept  Dynamic Data Center concept
Dynamic Data Center concept
 
Big Data & Smart City Applications
Big Data & Smart City ApplicationsBig Data & Smart City Applications
Big Data & Smart City Applications
 
Semantic Sensor Networks and Linked Stream Data
Semantic Sensor Networks and Linked Stream DataSemantic Sensor Networks and Linked Stream Data
Semantic Sensor Networks and Linked Stream Data
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 
Multimedia Mining
Multimedia Mining Multimedia Mining
Multimedia Mining
 
Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017
 
Physical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City ApplicationsPhysical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City Applications
 

Mehr von Vincenzo Gulisano

Tutorial: The Role of Event-Time Analysis Order in Data Streaming
Tutorial: The Role of Event-Time Analysis Order in Data StreamingTutorial: The Role of Event-Time Analysis Order in Data Streaming
Tutorial: The Role of Event-Time Analysis Order in Data StreamingVincenzo Gulisano
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Vincenzo Gulisano
 
Performance Modeling of Stream Joins
Performance Modeling of Stream JoinsPerformance Modeling of Stream Joins
Performance Modeling of Stream JoinsVincenzo Gulisano
 
The data streaming paradigm and its use in Fog architectures
The data streaming paradigm and its use in Fog architecturesThe data streaming paradigm and its use in Fog architectures
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
 
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream JoinScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream JoinVincenzo Gulisano
 
The benefits of fine-grained synchronization in deterministic and efficient ...
The benefits of fine-grained synchronization in  deterministic and efficient ...The benefits of fine-grained synchronization in  deterministic and efficient ...
The benefits of fine-grained synchronization in deterministic and efficient ...Vincenzo Gulisano
 

Mehr von Vincenzo Gulisano (7)

Tutorial: The Role of Event-Time Analysis Order in Data Streaming
Tutorial: The Role of Event-Time Analysis Order in Data StreamingTutorial: The Role of Event-Time Analysis Order in Data Streaming
Tutorial: The Role of Event-Time Analysis Order in Data Streaming
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
 
Strel streaming
Strel streamingStrel streaming
Strel streaming
 
Performance Modeling of Stream Joins
Performance Modeling of Stream JoinsPerformance Modeling of Stream Joins
Performance Modeling of Stream Joins
 
The data streaming paradigm and its use in Fog architectures
The data streaming paradigm and its use in Fog architecturesThe data streaming paradigm and its use in Fog architectures
The data streaming paradigm and its use in Fog architectures
 
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream JoinScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
 
The benefits of fine-grained synchronization in deterministic and efficient ...
The benefits of fine-grained synchronization in  deterministic and efficient ...The benefits of fine-grained synchronization in  deterministic and efficient ...
The benefits of fine-grained synchronization in deterministic and efficient ...
 

Kürzlich hochgeladen

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 

Kürzlich hochgeladen (20)

CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 

Data Streaming in Big Data Analysis

  • 1. Data Streaming in Big Data Analysis Docent lecture Vincenzo Gulisano, Ph.D. Vincenzo Gulisano Data streaming in Big Data analysis 1
  • 2. Vincenzo Gulisano Data streaming in Big Data analysis 2
  • 3. Agenda •Why data streaming? •How does it work? •Past, current and future research •Conclusions •Bibliography Vincenzo Gulisano Data streaming in Big Data analysis 3
  • 4. Agenda •Why data streaming? •How does it work? •Past, current and future research •Conclusions •Bibliography Vincenzo Gulisano Data streaming in Big Data analysis 4
  • 5. ... back in the year 2000 • Continuous processing of data streams • Real-time fashion ... Store-then-process is not feasible Vincenzo Gulisano Data streaming in Big Data analysis 5 Financial applications Sensor networks ISPs
  • 6. ~ 2010 Vincenzo Gulisano Data streaming in Big Data analysis 6
  • 7. 2017 Vincenzo Gulisano Data streaming in Big Data analysis 7 Advanced Metering Infrastructures Vehicular Networks 1. Billions of readings per day cannot be transferred continuously 2. The latency incurred while transferring data might undermine the utility of the analysis 3. It is not secure to concentrate all the data in a single place 4. Privacy can be leaked when giving away fine-grained data What do we need then? • Efficient one-pass analysis • In memory • Bounded resources
  • 8. Agenda •Why data streaming? •How does it work? •Past, current and future research •Conclusions •Bibliography Vincenzo Gulisano Data streaming in Big Data analysis 8
  • 9. Main Memory Disk 1 Data Query Processing 3 Query results 2 Query Main Memory Query Processing Continuous Query Data Query results 9Data streaming in Big Data analysisVincenzo Gulisano DBMS vs. DSMS
  • 10. data stream: unbounded sequence of tuples sharing the same schema 10 Example: vehicles’ speed and position reports time Field Field vehicle id text time (secs) text speed (Km/h) double X coordinate double Y coordinate double A 8:00 55.5 X1 Y1 A 8:07 34.3 X3 Y3 A 8:03 70.3 X2 Y2 Data streaming in Big Data analysisVincenzo Gulisano
  • 11. continuous query: Directed Acyclic Graph (DAG) of streams and operators 11 OP OP OP OP OP OP OP source op (1+ out streams) sink op (1+ in streams) stream op (1+ in, 1+ out streams) Data streaming in Big Data analysisVincenzo Gulisano
  • 12. data streaming operators • Stateless operators • do not maintain any state • one-by-one processing • Stateful operators • maintain a state that evolves with the tuples being processed • produce output tuples that depend on multiple input tuples 12 OP OP Data streaming in Big Data analysisVincenzo Gulisano
  • 13. stateless operators 13 Filter ... Map Union ... Filter / route tuples based on one (or more) conditions Transform each tuple Merge multiple streams (with the same schema) into one Data streaming in Big Data analysisVincenzo Gulisano
  • 14. stateful operators 14 Aggregate information from multiple tuples (e.g., compute average speed of the tuples in the last hour) Compare tuples coming from 2 streams given a certain predicate (e.g., given the last 5 tuples from each stream, join every pair reporting the same position) Aggregate Join Data streaming in Big Data analysisVincenzo Gulisano Since streams are unbounded, windows (over time or tuples) are defined to bound the portion of tuples to aggregate or join
  • 15. sample query For each vehicle, raise an alert if the speed of the latest report is more than 2 times higher than its average speed in the last 30 days. 15 time A 8:00 55.5 X1 Y1 A 8:07 34.3 X3 Y3 A 8:03 70.3 X2 Y2 Data streaming in Big Data analysisVincenzo Gulisano
  • 16. 16 Field vehicle id time (secs) speed (Km/h) X coordinate Y coordinate Compute average speed for each vehicle during the last 30 days Aggregate Field vehicle id time (secs) avg speed (Km/h) Join Check condition Filter Field vehicle id time (secs) speed (Km/h) Join on vehicle id Field vehicle id time (secs) avg speed (Km/h) speed (Km/h) sample query Data streaming in Big Data analysisVincenzo Gulisano
  • 17. A B C A B C A B C A B C B C A A A B B Vincenzo Gulisano Data streaming in Big Data analysis 17
  • 18. Agenda •Why data streaming? •How does it work? •Past, current and future research •Conclusions •Bibliography Vincenzo Gulisano Data streaming in Big Data analysis 18
  • 19. Faulttolerance Elasticity Loadbalancing Determinism Parallel execution of streaming operators Vincenzo Gulisano Data streaming in Big Data analysis 19 Parallel execution of streaming applications OP1 OP2 OP1 OP2 OP1 OP2 = 1) how to route tuples? 2) where to route tuples? 3) How to merge tuples? 4) How many instances to deploy per operator? ...
  • 20. Faulttolerance Elasticity Loadbalancing Determinism Parallel execution of streaming operators Vincenzo Gulisano Data streaming in Big Data analysis 20 Parallel execution of streaming applications DDoSdetection andmitigation Intrusiondetection Datavalidation Differentially privateaggregation Vehicularnetworks analysis Urbanmobility analysis Security and privacy IoT Transportation sustainability
  • 21. Synchronization / Data structures Faulttolerance Elasticity Loadbalancing Determinism Parallel execution of streaming operators Vincenzo Gulisano Data streaming in Big Data analysis 21 Parallel execution of streaming applications Many-core systems / FPGAs DDoSdetection andmitigation Intrusiondetection Datavalidation Differentially privateaggregation Vehicularnetworks analysis Urbanmobility analysis Parallel joins Parallel aggregates Joins modeling Security and privacy IoT Transportation sustainability
  • 22. Vincenzo Gulisano Data streaming in Big Data analysis 22 Synchronization / Data structures Faulttolerance Elasticity Loadbalancing DeterminismParallel execution of streaming operators Parallel execution of streaming applications Many-core systems / FPGAs DDoSdetection andmitigation Intrusiondetection Datavalidation Differentially privateaggregation Vehicularnetworks analysis Urbanmobility analysis Security and privacy Transportatio sustainabilit Distributed execution of streaming applications first the hardware, then the query first the query, then the hardware
  • 23. Agenda •Why data streaming? •How does it work? •Past, current and future research •Conclusions •Bibliography Vincenzo Gulisano Data streaming in Big Data analysis 23
  • 24. Millions of sensors • Store information • Iterate multiple times over data • Think, do not rush through decisions • ”Hard-wired” routines • Real-time decisions • High-throughput / low-latency Do I really need to try surströmming? (... NO) Danger!!! Run!!! (surströmming can opened) Vincenzo Gulisano Data streaming in Big Data analysis 24
  • 25. What traffic congestion patterns can I observe frequently? Don’t take over, car in opposite lane! Vincenzo Gulisano Data streaming in Big Data analysis 25 • Store information • Iterate multiple times over data • Think, do not rush through decisions • ”Hard-wired” routines • Real-time decisions • High-throughput / low-latency Millions of sensors + + +
  • 26. Agenda •Why data streaming? •How does it work? •Past, current and future research •Conclusions •Bibliography Vincenzo Gulisano Data streaming in Big Data analysis 26
  • 27. Bibliography 1. Zhou, Jiazhen, Rose Qingyang Hu, and Yi Qian. "Scalable distributed communication architectures to support advanced metering infrastructure in smart grid." IEEE Transactions on Parallel and Distributed Systems 23.9 (2012): 1632-1642. 2. Gulisano, Vincenzo, et al. "BES: Differentially Private and Distributed Event Aggregation in Advanced Metering Infrastructures." Proceedings of the 2nd ACM International Workshop on Cyber-Physical System Security. ACM, 2016. 3. Gulisano, Vincenzo, Magnus Almgren, and Marina Papatriantafilou. "Online and scalable data validation in advanced metering infrastructures." IEEE PES Innovative Smart Grid Technologies, Europe. IEEE, 2014. 4. Gulisano, Vincenzo, Magnus Almgren, and Marina Papatriantafilou. "METIS: a two-tier intrusion detection system for advanced metering infrastructures." International Conference on Security and Privacy in Communication Systems. Springer International Publishing, 2014. 5. Yousefi, Saleh, Mahmoud Siadat Mousavi, and Mahmood Fathy. "Vehicular ad hoc networks (VANETs): challenges and perspectives." 2006 6th International Conference on ITS Telecommunications. IEEE, 2006. 6. El Zarki, Magda, et al. "Security issues in a future vehicular network." European Wireless. Vol. 2. 2002. 7. Georgiadis, Giorgos, and Marina Papatriantafilou. "Dealing with storage without forecasts in smart grids: Problem transformation and online scheduling algorithm." Proceedings of the 29th Annual ACM Symposium on Applied Computing. ACM, 2014. 8. Fu, Zhang, et al. "Online temporal-spatial analysis for detection of critical events in Cyber-Physical Systems." Big Data (Big Data), 2014 IEEE International Conference on. IEEE, 2014. Data streaming in Big Data analysis 27Vincenzo Gulisano
  • 28. Bibliography 9. Arasu, Arvind, et al. "Linear road: a stream data management benchmark." Proceedings of the Thirtieth international conference on Very large data bases-Volume 30. VLDB Endowment, 2004. 10. Lv, Yisheng, et al. "Traffic flow prediction with big data: a deep learning approach." IEEE Transactions on Intelligent Transportation Systems 16.2 (2015): 865-873. 11. Grochocki, David, et al. "AMI threats, intrusion detection requirements and deployment recommendations." Smart Grid Communications (SmartGridComm), 2012 IEEE Third International Conference on. IEEE, 2012. 12. Molina-Markham, Andrés, et al. "Private memoirs of a smart meter." Proceedings of the 2nd ACM workshop on embedded sensing systems for energy-efficiency in building. ACM, 2010. 13. Gulisano, Vincenzo, et al. "Streamcloud: A large scale data streaming system." Distributed Computing Systems (ICDCS), 2010 IEEE 30th International Conference on. IEEE, 2010. 14. Stonebraker, Michael, Uǧur Çetintemel, and Stan Zdonik. "The 8 requirements of real-time stream processing." ACM SIGMOD Record 34.4 (2005): 42-47. 15. Bonomi, Flavio, et al. "Fog computing and its role in the internet of things." Proceedings of the first edition of the MCC workshop on Mobile cloud computing. ACM, 2012. Data streaming in Big Data analysis 28Vincenzo Gulisano
  • 29. Bibliography 16. Gulisano, Vincenzo Massimiliano. StreamCloud: An Elastic Parallel-Distributed Stream Processing Engine. Diss. Informatica, 2012. 17. Cardellini, Valeria, et al. "Optimal operator placement for distributed stream processing applications." Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems. ACM, 2016. 18. Costache, Stefania, et al. "Understanding the Data-Processing Challenges in Intelligent Vehicular Systems." Proceedings of the 2016 IEEE Intelligent Vehicles Symposium (IV16). 19. Giatrakos, Nikos, Antonios Deligiannakis, and Minos Garofalakis. "Scalable Approximate Query Tracking over Highly Distributed Data Streams." Proceedings of the 2016 International Conference on Management of Data. ACM, 2016. 20. Gulisano, Vincenzo, et al. "Streamcloud: An elastic and scalable data streaming system." IEEE Transactions on Parallel and Distributed Systems 23.12 (2012): 2351-2365. 21. Shah, Mehul A., et al. "Flux: An adaptive partitioning operator for continuous query systems." Data Engineering, 2003. Proceedings. 19th International Conference on. IEEE, 2003. Data streaming in Big Data analysis 29Vincenzo Gulisano
  • 30. Bibliography 22. Cederman, Daniel, et al. "Brief announcement: concurrent data structures for efficient streaming aggregation." Proceedings of the 26th ACM symposium on Parallelism in algorithms and architectures. ACM, 2014. 23. Ji, Yuanzhen, et al. "Quality-driven processing of sliding window aggregates over out-of-order data streams." Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems. ACM, 2015. 24. Ji, Yuanzhen, et al. "Quality-driven disorder handling for concurrent windowed stream queries with shared operators." Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems. ACM, 2016. 25. Gulisano, Vincenzo, et al. "Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join." Big Data (Big Data), 2015 IEEE International Conference on. IEEE, 2015. 26. Ottenwälder, Beate, et al. "MigCEP: operator migration for mobility driven distributed complex event processing." Proceedings of the 7th ACM international conference on Distributed event-based systems. ACM, 2013. 27. De Matteis, Tiziano, and Gabriele Mencagli. "Keep calm and react with foresight: strategies for low-latency and energy-efficient elastic data stream processing." Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, 2016. 28. Balazinska, Magdalena, et al. "Fault-tolerance in the Borealis distributed stream processing system." ACM Transactions on Database Systems (TODS) 33.1 (2008): 3. 29. Castro Fernandez, Raul, et al. "Integrating scale out and fault tolerance in stream processing using operator state management." Proceedings of the 2013 ACM SIGMOD international conference on Management of data. ACM, 2013. Data streaming in Big Data analysis 30Vincenzo Gulisano
  • 31. Bibliography 30. Dwork, Cynthia. "Differential privacy: A survey of results." International Conference on Theory and Applications of Models of Computation. Springer Berlin Heidelberg, 2008. 31. Dwork, Cynthia, et al. "Differential privacy under continual observation." Proceedings of the forty-second ACM symposium on Theory of computing. ACM, 2010. 32. Kargl, Frank, Arik Friedman, and Roksana Boreli. "Differential privacy in intelligent transportation systems." Proceedings of the sixth ACM conference on Security and privacy in wireless and mobile networks. ACM, 2013. Data streaming in Big Data analysis 31Vincenzo Gulisano

Hinweis der Redaktion

  1. Before we start... questions and please notice
  2. making peace with DBs...