Streaming options in the wild

•Als PPTX, PDF herunterladen•

0 gefällt mir•82 views

A quick take on different streaming options available out there.Speakers-: Palash Chatterjee & Atif Akhtar

Daten & Analysen

Streaming Your Data
1
Options in the wildBy -: Palash Chatterjee & Atif Akhtar

2
Current Landscape

3
Data Stream
Abstraction representing and unbounded data set - one that is infinite in its
definition and ever growing. Ordered and immutable in nature.

What are the different types of options available out there?
4
Real time processing
Near real time
processing
Micro-batching

Stream Processing
Event Stream
5
Transformation F(x)
Input Stream
Transformation G(x)
Output Stream

6
Things to keep in mind
a. Time
i. Event time
ii. Log append time
iii. Processing time
b. State
i. Local or internal state
ii. External state
c. Processing Time Window
d. Restartability/Fault tolerance and Reprocessing
e. Out of sequence events

7
Use Cases for Streaming
Stock Market
Analysis
IoT Log Monitoring
Business Analysis Complex Event
Processing
Clickstream
Analysis

8
Kafka

9
Flume

1
0
Flume vs. Kafka
FLUME KAFKA
Meant to collect data and put in one place
(HDFS or HBase) - Built for Hadoop
General purpose - highly Scalable PUB Sub
Push Pull - Handles spikes very well
Not dynamically scalable Can add more Pub/Sub without restarting
Has more connectors Has better community - Has connectors now
No guarantee about order of delivery Order of delivery preserved within a partition

1
1
Spark Streaming

1
2
Spark Streaming

1
3
Spark Streaming
➔ Windowed micro batching
➔ Highly Scalable and Dynamic
➔ Huge community and well tested
➔ Huge library for ML/SQL/Analytics
➔ Lot of third party tools directly
integrate
➔ No support for per event streaming
➔ Very difficult to handle out of batch
events
➔ Micro batching introduces latency

1
4
Storm

1
5
Storm/Heron
➔ Near real time processing
[micro-batching using Trident]
➔ No single point of failure
➔ At-least-once processing guarantee
[exactly-once using Trident]
➔ Windowing support [using Trident]
➔ Little community support
➔ Not tied to Hadoop

1
6
Apache Samza

1
7
Apache Samza
➔ Performs near real time - per event
processing
➔ Works on top of YARN
➔ Lot of connectors for Hadoop tools
➔ Stateful
➔ Tied into Hadoop
➔ Topologies cannot be connected -
everything needs to be written to Kafka
➔ Fairly new and very small community
➔ JVM Language only

1
8
Akka Streams

1
9
Akka Streams
val fetchLinks: Flow[String, Link, Unit] =
Flow[String]
.via(throttle(redditAPIRate))
.mapAsyncUnordered( subreddit => RedditAPI.popularLinks(subreddit)
)

2
0
Akka Streams
➔ Performs near real time - per event
processing
➔ Built with the use case of handling
backpressure over single
nodes.Reactive backpressure handling
➔ Handles backpressure efficiently up to
the OS level
➔ Being used internally by the latest
version of Spark Streaming to boost
performance
➔ Not an alternative to Spark
➔ Have to follow and respect Actor pattern
everywhere

At a glance
2
1
Source : https://mapr.com/blog/stream-processing-everywhere-what-use/

Use Case - Real Time Image Tagging
2
2

Use Case - Product And Per Interval Trends
2
3
Reporting

References and Good Reads
2
4
1.http://milinda.pathirage.org/kappa-architecture.com/
2.https://www.safaribooksonline.com/library/view/kafka-the-definitive/9781491936153/
3.https://www.youtube.com/results?search_query=reactive+streams+akka
4.https://en.wikipedia.org/wiki/Lambda_architecture
5.https://stackoverflow.com/questions/29111549/where-do-apache-samza-and-apache-storm-differ-in-their-use-cases

2
5
2
5
QUESTIONS

THANK YOU

Empfohlen

Introduction to InfluxDB and TICK Stack

Introduction to InfluxDB and TICK Stack

Introduction to InfluxDB and TICK StackAhmed AbouZaid

Developing Ansible Dynamic Inventory Script - Nov 2017

Developing Ansible Dynamic Inventory Script - Nov 2017

Developing Ansible Dynamic Inventory Script - Nov 2017Ahmed AbouZaid

Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...

Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...

Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...ScyllaDB

IOT meetup presentation

IOT meetup presentation

IOT meetup presentationCliff Gilmore

Principles in Data Stream Processing | Matthias J Sax, Confluent

Principles in Data Stream Processing | Matthias J Sax, Confluent

Principles in Data Stream Processing | Matthias J Sax, ConfluentHostedbyConfluent

Monitoring, Hold the Infrastructure - Getting the Most out of AWS Lambda – Da...

Monitoring, Hold the Infrastructure - Getting the Most out of AWS Lambda – Da...

Monitoring, Hold the Infrastructure - Getting the Most out of AWS Lambda – Da...Amazon Web Services

Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...

Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...

Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Flink Forward

Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog

Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog

Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDogRedis Labs

Empfohlen

Introduction to InfluxDB and TICK Stack

Introduction to InfluxDB and TICK Stack

Introduction to InfluxDB and TICK StackAhmed AbouZaid

Developing Ansible Dynamic Inventory Script - Nov 2017

Developing Ansible Dynamic Inventory Script - Nov 2017

Developing Ansible Dynamic Inventory Script - Nov 2017Ahmed AbouZaid

Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...

Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...

Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...ScyllaDB

IOT meetup presentation

IOT meetup presentation

IOT meetup presentationCliff Gilmore

Principles in Data Stream Processing | Matthias J Sax, Confluent

Principles in Data Stream Processing | Matthias J Sax, Confluent

Principles in Data Stream Processing | Matthias J Sax, ConfluentHostedbyConfluent

Monitoring, Hold the Infrastructure - Getting the Most out of AWS Lambda – Da...

Monitoring, Hold the Infrastructure - Getting the Most out of AWS Lambda – Da...

Monitoring, Hold the Infrastructure - Getting the Most out of AWS Lambda – Da...Amazon Web Services

Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...

Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...

Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Flink Forward

Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog

Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog

Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDogRedis Labs

Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...

Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...

Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...confluent

Introduction to Streaming Distributed Processing with Storm

Introduction to Streaming Distributed Processing with Storm

Introduction to Streaming Distributed Processing with StormBrandon O'Brien

Provisioning Datadog with Terraform

Provisioning Datadog with Terraform

Provisioning Datadog with TerraformMatt Spurlin

Monitoring and scaling postgres at datadog

Monitoring and scaling postgres at datadog

Monitoring and scaling postgres at datadogSeth Rosenblum

The Evolution of (Open Source) Data Processing

The Evolution of (Open Source) Data Processing

The Evolution of (Open Source) Data ProcessingAljoscha Krettek

InfluxDB and Grafana: An Introduction to Time-Based Data Storage and Visualiz...

InfluxDB and Grafana: An Introduction to Time-Based Data Storage and Visualiz...

InfluxDB and Grafana: An Introduction to Time-Based Data Storage and Visualiz...Caner Ünal

goto; London: Keeping your Cloud Footprint in Check

goto; London: Keeping your Cloud Footprint in Check

goto; London: Keeping your Cloud Footprint in CheckCoburn Watson

Building highly reliable data pipeline @datadog par Quentin François

Building highly reliable data pipeline @datadog par Quentin François

Building highly reliable data pipeline @datadog par Quentin FrançoisParis Data Engineers !

Going from three nines to four nines using Kafka | Tejas Chopra, Netflix

Going from three nines to four nines using Kafka | Tejas Chopra, Netflix

Going from three nines to four nines using Kafka | Tejas Chopra, NetflixHostedbyConfluent

Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...

Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...

Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...InfluxData

Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day

Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day

Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media

Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen

Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen

Container Monitoring Best Practices Using AWS and InfluxData by Gunnar AasenInfluxData

FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...

FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...

FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...Flink Forward

Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...

Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...

Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...Coburn Watson

Low latency stream processing with jet

Low latency stream processing with jet

Low latency stream processing with jetStreamNative

Introduction to Apache Flink at Vienna Meet Up

Introduction to Apache Flink at Vienna Meet Up

Introduction to Apache Flink at Vienna Meet UpStefan Papp

Netflix Keystone—Cloud scale event processing pipeline

Netflix Keystone—Cloud scale event processing pipeline

Netflix Keystone—Cloud scale event processing pipelineMonal Daxini

RedisConf18 - Implementing a New Data Structure for Redis

RedisConf18 - Implementing a New Data Structure for Redis

RedisConf18 - Implementing a New Data Structure for Redis Redis Labs

INTRODUCING: CREATE PIPELINE

INTRODUCING: CREATE PIPELINE

INTRODUCING: CREATE PIPELINESingleStore

The Past, Present, and Future of Apache Flink®

The Past, Present, and Future of Apache Flink®

The Past, Present, and Future of Apache Flink®Aljoscha Krettek

Apache Flink(tm) - A Next-Generation Stream Processor

Apache Flink(tm) - A Next-Generation Stream Processor

Apache Flink(tm) - A Next-Generation Stream ProcessorAljoscha Krettek

2011 06-30-hadoop-summit v5

2011 06-30-hadoop-summit v5

2011 06-30-hadoop-summit v5Samuel Rash

Weitere ähnliche Inhalte

Was ist angesagt?

Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...

Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...

Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...confluent

Introduction to Streaming Distributed Processing with Storm

Introduction to Streaming Distributed Processing with Storm

Introduction to Streaming Distributed Processing with StormBrandon O'Brien

Provisioning Datadog with Terraform

Provisioning Datadog with Terraform

Provisioning Datadog with TerraformMatt Spurlin

Monitoring and scaling postgres at datadog

Monitoring and scaling postgres at datadog

Monitoring and scaling postgres at datadogSeth Rosenblum

The Evolution of (Open Source) Data Processing

The Evolution of (Open Source) Data Processing

The Evolution of (Open Source) Data ProcessingAljoscha Krettek

InfluxDB and Grafana: An Introduction to Time-Based Data Storage and Visualiz...

InfluxDB and Grafana: An Introduction to Time-Based Data Storage and Visualiz...

InfluxDB and Grafana: An Introduction to Time-Based Data Storage and Visualiz...Caner Ünal

goto; London: Keeping your Cloud Footprint in Check

goto; London: Keeping your Cloud Footprint in Check

goto; London: Keeping your Cloud Footprint in CheckCoburn Watson

Building highly reliable data pipeline @datadog par Quentin François

Building highly reliable data pipeline @datadog par Quentin François

Building highly reliable data pipeline @datadog par Quentin FrançoisParis Data Engineers !

Going from three nines to four nines using Kafka | Tejas Chopra, Netflix

Going from three nines to four nines using Kafka | Tejas Chopra, Netflix

Going from three nines to four nines using Kafka | Tejas Chopra, NetflixHostedbyConfluent

Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...

Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...

Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...InfluxData

Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day

Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day

Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media

Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen

Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen

Container Monitoring Best Practices Using AWS and InfluxData by Gunnar AasenInfluxData

FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...

FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...

FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...Flink Forward

Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...

Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...

Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...Coburn Watson

Low latency stream processing with jet

Low latency stream processing with jet

Low latency stream processing with jetStreamNative

Introduction to Apache Flink at Vienna Meet Up

Introduction to Apache Flink at Vienna Meet Up

Introduction to Apache Flink at Vienna Meet UpStefan Papp

Netflix Keystone—Cloud scale event processing pipeline

Netflix Keystone—Cloud scale event processing pipeline

Netflix Keystone—Cloud scale event processing pipelineMonal Daxini

RedisConf18 - Implementing a New Data Structure for Redis

RedisConf18 - Implementing a New Data Structure for Redis

RedisConf18 - Implementing a New Data Structure for Redis Redis Labs

INTRODUCING: CREATE PIPELINE

INTRODUCING: CREATE PIPELINE

INTRODUCING: CREATE PIPELINESingleStore

The Past, Present, and Future of Apache Flink®

The Past, Present, and Future of Apache Flink®

The Past, Present, and Future of Apache Flink®Aljoscha Krettek

Was ist angesagt? (20)

Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...

Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...

Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...

Introduction to Streaming Distributed Processing with Storm

Introduction to Streaming Distributed Processing with Storm

Introduction to Streaming Distributed Processing with Storm

Provisioning Datadog with Terraform

Provisioning Datadog with Terraform

Provisioning Datadog with Terraform

Monitoring and scaling postgres at datadog

Monitoring and scaling postgres at datadog

Monitoring and scaling postgres at datadog

The Evolution of (Open Source) Data Processing

The Evolution of (Open Source) Data Processing

The Evolution of (Open Source) Data Processing

InfluxDB and Grafana: An Introduction to Time-Based Data Storage and Visualiz...

InfluxDB and Grafana: An Introduction to Time-Based Data Storage and Visualiz...

InfluxDB and Grafana: An Introduction to Time-Based Data Storage and Visualiz...

goto; London: Keeping your Cloud Footprint in Check

goto; London: Keeping your Cloud Footprint in Check

goto; London: Keeping your Cloud Footprint in Check

Building highly reliable data pipeline @datadog par Quentin François

Building highly reliable data pipeline @datadog par Quentin François

Building highly reliable data pipeline @datadog par Quentin François

Going from three nines to four nines using Kafka | Tejas Chopra, Netflix

Going from three nines to four nines using Kafka | Tejas Chopra, Netflix

Going from three nines to four nines using Kafka | Tejas Chopra, Netflix

Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...

Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...

Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...

Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day

Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day

Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day

Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen

Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen

Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen

FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...

FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...

FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...

Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...

Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...

Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...

Low latency stream processing with jet

Low latency stream processing with jet

Low latency stream processing with jet

Introduction to Apache Flink at Vienna Meet Up

Introduction to Apache Flink at Vienna Meet Up

Introduction to Apache Flink at Vienna Meet Up

Netflix Keystone—Cloud scale event processing pipeline

Netflix Keystone—Cloud scale event processing pipeline

Netflix Keystone—Cloud scale event processing pipeline

RedisConf18 - Implementing a New Data Structure for Redis

RedisConf18 - Implementing a New Data Structure for Redis

RedisConf18 - Implementing a New Data Structure for Redis

INTRODUCING: CREATE PIPELINE

INTRODUCING: CREATE PIPELINE

INTRODUCING: CREATE PIPELINE

The Past, Present, and Future of Apache Flink®

The Past, Present, and Future of Apache Flink®

The Past, Present, and Future of Apache Flink®

Ähnlich wie Streaming options in the wild

Apache Flink(tm) - A Next-Generation Stream Processor

Apache Flink(tm) - A Next-Generation Stream Processor

Apache Flink(tm) - A Next-Generation Stream ProcessorAljoscha Krettek

2011 06-30-hadoop-summit v5

2011 06-30-hadoop-summit v5

2011 06-30-hadoop-summit v5Samuel Rash

Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)

Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)

Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Apache Flink Taiwan User Group

GOTO Night Amsterdam - Stream processing with Apache Flink

GOTO Night Amsterdam - Stream processing with Apache Flink

GOTO Night Amsterdam - Stream processing with Apache FlinkRobert Metzger

Stream Processing and Real-Time Data Pipelines

Stream Processing and Real-Time Data Pipelines

Stream Processing and Real-Time Data PipelinesVladimír Schreiner

QCon London - Stream Processing with Apache Flink

QCon London - Stream Processing with Apache Flink

QCon London - Stream Processing with Apache FlinkRobert Metzger

Cassandra summit-2013

Cassandra summit-2013

Cassandra summit-2013dfilppi

Trend Micro Big Data Platform and Apache Bigtop

Trend Micro Big Data Platform and Apache Bigtop

Trend Micro Big Data Platform and Apache BigtopEvans Ye

Real-Time Data Flows with Apache NiFi

Real-Time Data Flows with Apache NiFi

Real-Time Data Flows with Apache NiFiManish Gupta

Robust stream processing with Apache Flink

Robust stream processing with Apache Flink

Robust stream processing with Apache FlinkAljoscha Krettek

Apache Flink Training: System Overview

Apache Flink Training: System Overview

Apache Flink Training: System OverviewFlink Forward

Hw09 Production Deep Dive With High Availability

Hw09 Production Deep Dive With High Availability

Hw09 Production Deep Dive With High AvailabilityCloudera, Inc.

Apache Kafka® and the Data Mesh

Apache Kafka® and the Data Mesh

Apache Kafka® and the Data MeshConfluentInc1

Debunking Common Myths in Stream Processing

Debunking Common Myths in Stream Processing

Debunking Common Myths in Stream ProcessingDataWorks Summit/Hadoop Summit

Unified Batch and Real-Time Stream Processing Using Apache Flink

Unified Batch and Real-Time Stream Processing Using Apache Flink

Unified Batch and Real-Time Stream Processing Using Apache FlinkSlim Baltagi

Performance Comparison of Streaming Big Data Platforms

Performance Comparison of Streaming Big Data Platforms

Performance Comparison of Streaming Big Data PlatformsDataWorks Summit/Hadoop Summit

Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...

Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...

Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Robert Metzger

Debunking Six Common Myths in Stream Processing

Debunking Six Common Myths in Stream Processing

Debunking Six Common Myths in Stream ProcessingKostas Tzoumas

Flexible and Real-Time Stream Processing with Apache Flink

Flexible and Real-Time Stream Processing with Apache Flink

Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit

Big Data Streams Architectures. Why? What? How?

Big Data Streams Architectures. Why? What? How?

Big Data Streams Architectures. Why? What? How?Anton Nazaruk

Ähnlich wie Streaming options in the wild (20)

Apache Flink(tm) - A Next-Generation Stream Processor

Apache Flink(tm) - A Next-Generation Stream Processor

Apache Flink(tm) - A Next-Generation Stream Processor

2011 06-30-hadoop-summit v5

2011 06-30-hadoop-summit v5

2011 06-30-hadoop-summit v5

Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)

Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)

Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)

GOTO Night Amsterdam - Stream processing with Apache Flink

GOTO Night Amsterdam - Stream processing with Apache Flink

GOTO Night Amsterdam - Stream processing with Apache Flink

Stream Processing and Real-Time Data Pipelines

Stream Processing and Real-Time Data Pipelines

Stream Processing and Real-Time Data Pipelines

QCon London - Stream Processing with Apache Flink

QCon London - Stream Processing with Apache Flink

QCon London - Stream Processing with Apache Flink

Cassandra summit-2013

Cassandra summit-2013

Cassandra summit-2013

Trend Micro Big Data Platform and Apache Bigtop

Trend Micro Big Data Platform and Apache Bigtop

Trend Micro Big Data Platform and Apache Bigtop

Real-Time Data Flows with Apache NiFi

Real-Time Data Flows with Apache NiFi

Real-Time Data Flows with Apache NiFi

Robust stream processing with Apache Flink

Robust stream processing with Apache Flink

Robust stream processing with Apache Flink

Apache Flink Training: System Overview

Apache Flink Training: System Overview

Apache Flink Training: System Overview

Hw09 Production Deep Dive With High Availability

Hw09 Production Deep Dive With High Availability

Hw09 Production Deep Dive With High Availability

Apache Kafka® and the Data Mesh

Apache Kafka® and the Data Mesh

Apache Kafka® and the Data Mesh

Debunking Common Myths in Stream Processing

Debunking Common Myths in Stream Processing

Debunking Common Myths in Stream Processing

Unified Batch and Real-Time Stream Processing Using Apache Flink

Unified Batch and Real-Time Stream Processing Using Apache Flink

Unified Batch and Real-Time Stream Processing Using Apache Flink

Performance Comparison of Streaming Big Data Platforms

Performance Comparison of Streaming Big Data Platforms

Performance Comparison of Streaming Big Data Platforms

Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...

Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...

Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...

Debunking Six Common Myths in Stream Processing

Debunking Six Common Myths in Stream Processing

Debunking Six Common Myths in Stream Processing

Flexible and Real-Time Stream Processing with Apache Flink

Flexible and Real-Time Stream Processing with Apache Flink

Flexible and Real-Time Stream Processing with Apache Flink

Big Data Streams Architectures. Why? What? How?

Big Data Streams Architectures. Why? What? How?

Big Data Streams Architectures. Why? What? How?

Kürzlich hochgeladen

VidaXL dropshipping via API with DroFx.pptx

VidaXL dropshipping via API with DroFx.pptx

VidaXL dropshipping via API with DroFx.pptxolyaivanovalion

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls

Edukaciniai dropshipping via API with DroFx

Edukaciniai dropshipping via API with DroFx

Edukaciniai dropshipping via API with DroFxolyaivanovalion

Halmar dropshipping via API with DroFx

Halmar dropshipping via API with DroFx

Halmar dropshipping via API with DroFxolyaivanovalion

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71

Ravak dropshipping via API with DroFx.pptx

Ravak dropshipping via API with DroFx.pptx

Ravak dropshipping via API with DroFx.pptxolyaivanovalion

Introduction-to-Machine-Learning (1).pptx

Introduction-to-Machine-Learning (1).pptx

Introduction-to-Machine-Learning (1).pptxfirstjob4

CebaBaby dropshipping via API with DroFX.pptx

CebaBaby dropshipping via API with DroFX.pptx

CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion

Mature dropshipping via API with DroFx.pptx

Mature dropshipping via API with DroFx.pptx

Mature dropshipping via API with DroFx.pptxolyaivanovalion

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Generative AI on Enterprise Cloud with NiFi and Milvus

Generative AI on Enterprise Cloud with NiFi and Milvus

Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh9953056974 Low Rate Call Girls In Saket, Delhi NCR

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692

BigBuy dropshipping via API with DroFx.pptx

BigBuy dropshipping via API with DroFx.pptx

BigBuy dropshipping via API with DroFx.pptxolyaivanovalion

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823

100-Concepts-of-AI by Anupama Kate .pptx

100-Concepts-of-AI by Anupama Kate .pptx

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

Kürzlich hochgeladen (20)

VidaXL dropshipping via API with DroFx.pptx

VidaXL dropshipping via API with DroFx.pptx

VidaXL dropshipping via API with DroFx.pptx

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service

Edukaciniai dropshipping via API with DroFx

Edukaciniai dropshipping via API with DroFx

Edukaciniai dropshipping via API with DroFx

Halmar dropshipping via API with DroFx

Halmar dropshipping via API with DroFx

Halmar dropshipping via API with DroFx

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

Ravak dropshipping via API with DroFx.pptx

Ravak dropshipping via API with DroFx.pptx

Ravak dropshipping via API with DroFx.pptx

Introduction-to-Machine-Learning (1).pptx

Introduction-to-Machine-Learning (1).pptx

Introduction-to-Machine-Learning (1).pptx

CebaBaby dropshipping via API with DroFX.pptx

CebaBaby dropshipping via API with DroFX.pptx

CebaBaby dropshipping via API with DroFX.pptx

Mature dropshipping via API with DroFx.pptx

Mature dropshipping via API with DroFx.pptx

Mature dropshipping via API with DroFx.pptx

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service

Generative AI on Enterprise Cloud with NiFi and Milvus

Generative AI on Enterprise Cloud with NiFi and Milvus

Generative AI on Enterprise Cloud with NiFi and Milvus

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx

BigBuy dropshipping via API with DroFx.pptx

BigBuy dropshipping via API with DroFx.pptx

BigBuy dropshipping via API with DroFx.pptx

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

100-Concepts-of-AI by Anupama Kate .pptx

100-Concepts-of-AI by Anupama Kate .pptx

100-Concepts-of-AI by Anupama Kate .pptx

Streaming options in the wild

1. Streaming Your Data 1 Options in the wildBy -: Palash Chatterjee & Atif Akhtar

2. 2 Current Landscape

3. 3 Data Stream Abstraction representing and unbounded data set - one that is infinite in its definition and ever growing. Ordered and immutable in nature.

4. What are the different types of options available out there? 4 Real time processing Near real time processing Micro-batching

5. Stream Processing Event Stream 5 Transformation F(x) Input Stream Transformation G(x) Output Stream

6. 6 Things to keep in mind a. Time i. Event time ii. Log append time iii. Processing time b. State i. Local or internal state ii. External state c. Processing Time Window d. Restartability/Fault tolerance and Reprocessing e. Out of sequence events

7. 7 Use Cases for Streaming Stock Market Analysis IoT Log Monitoring Business Analysis Complex Event Processing Clickstream Analysis

10. 1 0 Flume vs. Kafka FLUME KAFKA Meant to collect data and put in one place (HDFS or HBase) - Built for Hadoop General purpose - highly Scalable PUB Sub Push Pull - Handles spikes very well Not dynamically scalable Can add more Pub/Sub without restarting Has more connectors Has better community - Has connectors now No guarantee about order of delivery Order of delivery preserved within a partition

11. 1 1 Spark Streaming

12. 1 2 Spark Streaming

13. 1 3 Spark Streaming ➔ Windowed micro batching ➔ Highly Scalable and Dynamic ➔ Huge community and well tested ➔ Huge library for ML/SQL/Analytics ➔ Lot of third party tools directly integrate ➔ No support for per event streaming ➔ Very difficult to handle out of batch events ➔ Micro batching introduces latency

15. 1 5 Storm/Heron ➔ Near real time processing [micro-batching using Trident] ➔ No single point of failure ➔ At-least-once processing guarantee [exactly-once using Trident] ➔ Windowing support [using Trident] ➔ Little community support ➔ Not tied to Hadoop

16. 1 6 Apache Samza

17. 1 7 Apache Samza ➔ Performs near real time - per event processing ➔ Works on top of YARN ➔ Lot of connectors for Hadoop tools ➔ Stateful ➔ Tied into Hadoop ➔ Topologies cannot be connected - everything needs to be written to Kafka ➔ Fairly new and very small community ➔ JVM Language only

18. 1 8 Akka Streams

19. 1 9 Akka Streams val fetchLinks: Flow[String, Link, Unit] = Flow[String] .via(throttle(redditAPIRate)) .mapAsyncUnordered( subreddit => RedditAPI.popularLinks(subreddit) )

20. 2 0 Akka Streams ➔ Performs near real time - per event processing ➔ Built with the use case of handling backpressure over single nodes.Reactive backpressure handling ➔ Handles backpressure efficiently up to the OS level ➔ Being used internally by the latest version of Spark Streaming to boost performance ➔ Not an alternative to Spark ➔ Have to follow and respect Actor pattern everywhere

21. At a glance 2 1 Source : https://mapr.com/blog/stream-processing-everywhere-what-use/

22. Use Case - Real Time Image Tagging 2 2

23. Use Case - Product And Per Interval Trends 2 3 Reporting

24. References and Good Reads 2 4 1.http://milinda.pathirage.org/kappa-architecture.com/ 2.https://www.safaribooksonline.com/library/view/kafka-the-definitive/9781491936153/ 3.https://www.youtube.com/results?search_query=reactive+streams+akka 4.https://en.wikipedia.org/wiki/Lambda_architecture 5.https://stackoverflow.com/questions/29111549/where-do-apache-samza-and-apache-storm-differ-in-their-use-cases

25. 2 5 2 5 QUESTIONS