SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Real-time stream processing
for Big Data
Presented by Luay AL-Assadi
INTRODUCTION
Rise of the web 2.0 and the Internet of things.
 Huge amounts of data. (ex sensors, social media, online marketing).
 Track all kinds of information that are only valuable for a short time and therefore have to be
processed immediately.
 Monitoring user activity to optimize product or video recommendations for the current user
context.
Traditional batch-oriented approaches.
 Complex Event Processing (CEP) engines and DBMSs.
Distributed data processing.
 MapReduce.
Real-time analytics: Big Data in motion
 Real time Data infrastructure:
 Built from distributed components.
 Communicate via asynchronous network.
 Engineered on top of the JVM(Java Virtual Machine).
 Real time Big Data Basic Architecture Model:
 Collecting data from various places.
 Moving data to streaming layer.
 Analyze data in stream processor.
 Forwarding outputs to serving layer.
Real-time analytics: Big Data in motion
 Big Data Architecture Model:
Collecting Data
Streaming Data
Batch processing
Store Data
Stream processing
Serving Layer
Lambda Architecture
Real-time analytics: Big Data in motion
 Big Data Architecture Models:
Collecting Data
Streaming Data
Stream processing
Serving Layer
Kappa Architecture
Store, retain Data
Real-time streamers
 RabbitMQ.
 Broker centric, message Acknowledgement.
 focused around delivery guarantees between producers and consumers.
 fall over if your consumers were too slow.
Producer ConsumerBROKER
Message
Ack
Real-time streamers
 Kafka.
Producer centric.
Online / Offline consumers.
Use Zookeeper to reliably maintain their state across a cluster.
Real-time processors:
Latency Throughput & Efficiency
Handling data items
immediately as they arrive.
buffering and processing them in
batches increased efficiency.
Low Latency High Throughput
SAMZA
STORM
SPARK
SPARK Streaming
Trident
Stream BatchMicro - Batch
groups tuples into batches
Restrict batch size
Real-time processors
 STORM
Storm was developed by
Nathan Marz as a BackType
project which was later
acquired by Twitter in the
year 2011.
initially promoted as the
“Hadoop of real-time”.
 The vital parts of a Storm
deployment are a ZooKeeper
cluster for reliable coordination.
Real-time processors
 STORM
Topology:
network made of spout and bolts
Similar to hadoop Map reduce.
Stream:
an unbounded pipeline of tuples
Spout & bolts:
receiving data continuously,
transforming those data into
actual stream of tuples and
finally sending them to the
bolts to be processed.
Real-time processors
 STORM
Nodes
Master Node:
runs a daemon called ‘Nimbus’,
which is similar to the ‘Job
Tracker’ of Hadoop cluster.
Assign Jobs.
Monitor performance.
Real-time processors
 STORM
Nodes
Worker Node:
runs a daemon called
‘Supervisor’.
run one or more worker
processes on its node.
Apache Zookeeper facilitates communication between
Nimbus and Supervisors with the help of message
acknowledgements and processing status.
Real-time processors
 SAMZA
It was initially created at LinkedIn, submitted to the Apache
Incubator in July 2013.
Samza was co-developed with the queueing system Kafka.
Samza requires a little more work than storm to deploy as it does
not only depend on a ZooKeeper cluster, but also runs on top of
Hadoop YARN.
Real-time processors
 SAMZA - YARN
cluster scheduler. It allows you to allocate a number
of containers (processes) in a cluster of machines, and execute
arbitrary commands on them, The Samza client uses YARN to run a
Samza job.
NodeManager: is responsible for launching processes on the
machine.
ResourceManager: Talks to all of the NodeManagers to tell
them what to run.
ApplicationMaster: is responsible for managing the
application’s workload, asking for containers, and handling
notifications when one of its containers fails.
Real-time processors
 SAMZA
decouples individual processing
steps.
buffering data between
processing steps makes
(intermediate) results available
to unrelated parties.
 Prevent data loss by periodically
checkpointing current progress
and reprocessing all data from
failure point.
Real-time processors
 SPARK
Is a batch-processing framework that is often mentioned as the in
official successor of Hadoop as it offers several benefits in
comparison.
significant performance improvements through in-memory
caching.
 Spark provides a variety of machine learning algorithms out-of-the-box
through the MLlib library.
Real-time processors
 SPARK – Architecture
Discussion
SPARKSAMZASTORM
Achievable latency
processing model
ordering guarantees
<< 100 ms < 100 ms < 1 s
one-at-a-time one-at-a-time Micro-batch
between batcheswithin stream partitionsNo
elasticity Yes YesNo
All these different systems show that low latency is involved in a
number of trade-offs with other desirable properties such as
throughput, fault-tolerance, reliability (processing guarantees) and
ease of development.
References
• https://www.quora.com/What-are-the-differences-between-
Apache-Spark-Storm-Samza-Flink-Beam-Apex
• https://www.quora.com/What-are-the-differences-between-batch-
processing-and-stream-processing-systems
• https://samza.apache.org/learn/documentation/0.10/introduction/
architecture.html
• https://dzone.com/articles/streaming-big-data-storm-spark
• Paper : Real-time stream processing for Big Data
Bereitgestellt von | Staats- und Universitätsbibliothek Hamburg
Angemeldet
Heruntergeladen am | 13.10.16 19:14
THANKS

Weitere ähnliche Inhalte

Was ist angesagt?

Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
Best Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale PlatformsBest Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale Platforms
Databricks
 

Was ist angesagt? (20)

The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Scaling Data Quality @ Netflix
Scaling Data Quality @ NetflixScaling Data Quality @ Netflix
Scaling Data Quality @ Netflix
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Introduction to Real-time data processing
Introduction to Real-time data processingIntroduction to Real-time data processing
Introduction to Real-time data processing
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
Best Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale PlatformsBest Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale Platforms
 

Andere mochten auch

Big Data in Real-Time at Twitter
Big Data in Real-Time at TwitterBig Data in Real-Time at Twitter
Big Data in Real-Time at Twitter
nkallen
 
Wintpresen 150424142843-conversion-gate02
Wintpresen 150424142843-conversion-gate02Wintpresen 150424142843-conversion-gate02
Wintpresen 150424142843-conversion-gate02
carmstea
 
Hive Poster
Hive PosterHive Poster
Hive Poster
ragho
 

Andere mochten auch (20)

Comparison of Open Source Frameworks for Integrating the Internet of Things
Comparison of Open Source Frameworks for Integrating the Internet of ThingsComparison of Open Source Frameworks for Integrating the Internet of Things
Comparison of Open Source Frameworks for Integrating the Internet of Things
 
Real-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS AzureReal-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS Azure
 
Spark Tips & Tricks
Spark Tips & TricksSpark Tips & Tricks
Spark Tips & Tricks
 
Big Data in Real-Time at Twitter
Big Data in Real-Time at TwitterBig Data in Real-Time at Twitter
Big Data in Real-Time at Twitter
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...
 
FundRock Fact Sheet
FundRock Fact SheetFundRock Fact Sheet
FundRock Fact Sheet
 
Catalogue i-tec
Catalogue i-tec Catalogue i-tec
Catalogue i-tec
 
Conteúdo programático do curso de matemática básica
Conteúdo programático do curso de matemática básicaConteúdo programático do curso de matemática básica
Conteúdo programático do curso de matemática básica
 
CV
CVCV
CV
 
Link del blog
Link del blogLink del blog
Link del blog
 
Wintpresen 150424142843-conversion-gate02
Wintpresen 150424142843-conversion-gate02Wintpresen 150424142843-conversion-gate02
Wintpresen 150424142843-conversion-gate02
 
Práctica manejo de internet
Práctica manejo de internetPráctica manejo de internet
Práctica manejo de internet
 
Hive Poster
Hive PosterHive Poster
Hive Poster
 
Integrate ManifoldCF with Solr
Integrate ManifoldCF with SolrIntegrate ManifoldCF with Solr
Integrate ManifoldCF with Solr
 
Revisão 5 exercícios de leitura de gráficos (1)
Revisão 5   exercícios de leitura de gráficos (1)Revisão 5   exercícios de leitura de gráficos (1)
Revisão 5 exercícios de leitura de gráficos (1)
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)
 
Flipped Classroom and blended learning, pros, cons, similarities and differences
Flipped Classroom and blended learning, pros, cons, similarities and differencesFlipped Classroom and blended learning, pros, cons, similarities and differences
Flipped Classroom and blended learning, pros, cons, similarities and differences
 
Aprendizes 1 ano aula 1 parte a aula pdf
Aprendizes 1 ano aula 1 parte a aula pdfAprendizes 1 ano aula 1 parte a aula pdf
Aprendizes 1 ano aula 1 parte a aula pdf
 

Ähnlich wie Real time big data stream processing

Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 

Ähnlich wie Real time big data stream processing (20)

CS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_ComputingCS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_Computing
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
Data Streaming in Kafka
Data Streaming in KafkaData Streaming in Kafka
Data Streaming in Kafka
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics Revised
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture Patterns
 
Dataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayDataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice Way
 
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
 
Webinar - Big Data: Let's SMACK - Jorg Schad
Webinar - Big Data: Let's SMACK - Jorg SchadWebinar - Big Data: Let's SMACK - Jorg Schad
Webinar - Big Data: Let's SMACK - Jorg Schad
 
Colorado OpenStack 5th Birthday Monasca Operations
Colorado OpenStack 5th Birthday Monasca OperationsColorado OpenStack 5th Birthday Monasca Operations
Colorado OpenStack 5th Birthday Monasca Operations
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 
Scala in increasingly demanding environments - DATABIZ
Scala in increasingly demanding environments - DATABIZScala in increasingly demanding environments - DATABIZ
Scala in increasingly demanding environments - DATABIZ
 
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
 
Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environm...
Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environm...Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environm...
Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environm...
 
Apache samza past, present and future
Apache samza  past, present and futureApache samza  past, present and future
Apache samza past, present and future
 
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
 
Bdu -stream_processing_with_smack_final
Bdu  -stream_processing_with_smack_finalBdu  -stream_processing_with_smack_final
Bdu -stream_processing_with_smack_final
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
 
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 

Kürzlich hochgeladen

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Kürzlich hochgeladen (20)

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 

Real time big data stream processing

  • 1. Real-time stream processing for Big Data Presented by Luay AL-Assadi
  • 2. INTRODUCTION Rise of the web 2.0 and the Internet of things.  Huge amounts of data. (ex sensors, social media, online marketing).  Track all kinds of information that are only valuable for a short time and therefore have to be processed immediately.  Monitoring user activity to optimize product or video recommendations for the current user context. Traditional batch-oriented approaches.  Complex Event Processing (CEP) engines and DBMSs. Distributed data processing.  MapReduce.
  • 3. Real-time analytics: Big Data in motion  Real time Data infrastructure:  Built from distributed components.  Communicate via asynchronous network.  Engineered on top of the JVM(Java Virtual Machine).  Real time Big Data Basic Architecture Model:  Collecting data from various places.  Moving data to streaming layer.  Analyze data in stream processor.  Forwarding outputs to serving layer.
  • 4. Real-time analytics: Big Data in motion  Big Data Architecture Model: Collecting Data Streaming Data Batch processing Store Data Stream processing Serving Layer Lambda Architecture
  • 5. Real-time analytics: Big Data in motion  Big Data Architecture Models: Collecting Data Streaming Data Stream processing Serving Layer Kappa Architecture Store, retain Data
  • 6. Real-time streamers  RabbitMQ.  Broker centric, message Acknowledgement.  focused around delivery guarantees between producers and consumers.  fall over if your consumers were too slow. Producer ConsumerBROKER Message Ack
  • 7. Real-time streamers  Kafka. Producer centric. Online / Offline consumers. Use Zookeeper to reliably maintain their state across a cluster.
  • 8. Real-time processors: Latency Throughput & Efficiency Handling data items immediately as they arrive. buffering and processing them in batches increased efficiency. Low Latency High Throughput SAMZA STORM SPARK SPARK Streaming Trident Stream BatchMicro - Batch groups tuples into batches Restrict batch size
  • 9. Real-time processors  STORM Storm was developed by Nathan Marz as a BackType project which was later acquired by Twitter in the year 2011. initially promoted as the “Hadoop of real-time”.  The vital parts of a Storm deployment are a ZooKeeper cluster for reliable coordination.
  • 10. Real-time processors  STORM Topology: network made of spout and bolts Similar to hadoop Map reduce. Stream: an unbounded pipeline of tuples Spout & bolts: receiving data continuously, transforming those data into actual stream of tuples and finally sending them to the bolts to be processed.
  • 11. Real-time processors  STORM Nodes Master Node: runs a daemon called ‘Nimbus’, which is similar to the ‘Job Tracker’ of Hadoop cluster. Assign Jobs. Monitor performance.
  • 12. Real-time processors  STORM Nodes Worker Node: runs a daemon called ‘Supervisor’. run one or more worker processes on its node. Apache Zookeeper facilitates communication between Nimbus and Supervisors with the help of message acknowledgements and processing status.
  • 13. Real-time processors  SAMZA It was initially created at LinkedIn, submitted to the Apache Incubator in July 2013. Samza was co-developed with the queueing system Kafka. Samza requires a little more work than storm to deploy as it does not only depend on a ZooKeeper cluster, but also runs on top of Hadoop YARN.
  • 14. Real-time processors  SAMZA - YARN cluster scheduler. It allows you to allocate a number of containers (processes) in a cluster of machines, and execute arbitrary commands on them, The Samza client uses YARN to run a Samza job. NodeManager: is responsible for launching processes on the machine. ResourceManager: Talks to all of the NodeManagers to tell them what to run. ApplicationMaster: is responsible for managing the application’s workload, asking for containers, and handling notifications when one of its containers fails.
  • 15. Real-time processors  SAMZA decouples individual processing steps. buffering data between processing steps makes (intermediate) results available to unrelated parties.  Prevent data loss by periodically checkpointing current progress and reprocessing all data from failure point.
  • 16. Real-time processors  SPARK Is a batch-processing framework that is often mentioned as the in official successor of Hadoop as it offers several benefits in comparison. significant performance improvements through in-memory caching.  Spark provides a variety of machine learning algorithms out-of-the-box through the MLlib library.
  • 17. Real-time processors  SPARK – Architecture
  • 18. Discussion SPARKSAMZASTORM Achievable latency processing model ordering guarantees << 100 ms < 100 ms < 1 s one-at-a-time one-at-a-time Micro-batch between batcheswithin stream partitionsNo elasticity Yes YesNo All these different systems show that low latency is involved in a number of trade-offs with other desirable properties such as throughput, fault-tolerance, reliability (processing guarantees) and ease of development.
  • 19. References • https://www.quora.com/What-are-the-differences-between- Apache-Spark-Storm-Samza-Flink-Beam-Apex • https://www.quora.com/What-are-the-differences-between-batch- processing-and-stream-processing-systems • https://samza.apache.org/learn/documentation/0.10/introduction/ architecture.html • https://dzone.com/articles/streaming-big-data-storm-spark • Paper : Real-time stream processing for Big Data Bereitgestellt von | Staats- und Universitätsbibliothek Hamburg Angemeldet Heruntergeladen am | 13.10.16 19:14