SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Downloaden Sie, um offline zu lesen
Kafka and Storm – event
processing in realtime
Guido Schmutz
WELCOME

Kafka and Storm – event
processing in realtime
Guido Schmutz

Jazoon 2013
23.10.2013

BASEL

2

BERN

LAUSANNE

ZÜRICH

DÜSSELDORF

FRANKFURT A.M.

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

FREIBURG I.BR.

HAMBURG

MÜNCHEN

STUTTGART

WIEN

Guido Schmutz
• 
• 

Working for Trivadis for more than 16 years
Oracle ACE Director for Fusion Middleware and SOA

• 
• 

Co-Author of different books
Consultant, Trainer Software Architect for Java, Oracle, SOA,
EDA, BigData und FastData

• 
• 

Member of Trivadis Architecture Board
Technology Manager @ Trivadis

• 

More than 20 years of software development 

experience

• 

Contact: guido.schmutz@trivadis.com

• 
• 

Blog: http://guidoschmutz.wordpress.com
Twitter: gschmutz

3

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Our company
Trivadis is a market leader in IT consulting, system integration,
solution engineering and the provision of IT services focusing
on
and
technologies in Switzerland,
Germany and Austria.
We offer our services in the following strategic business fields:

OPERATION

Trivadis Services takes over the interacting operation of your IT systems.
4

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
With over 600 specialists and IT experts in your region

Hamburg

Düsseldorf

Frankfurt

Stuttgart

Freiburg

Wien
München

Basel Brugg
Bern
Zurich
Lausanne

5
5

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

12 Trivadis branches and more than
600 employees
 
200 Service Level Agreements
 
Over 4,000 training participants
 
Research and development budget:
CHF 5.0 / EUR 4 million
 
Financially self-supporting and
sustainably profitable
 
Experience from more than 1,900
projects per year at over 800
customers
Agenda
1.  Introduction
2.  What is Apache Kafka?
3.  What is Twitter Storm?
4.  Summary

6

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Big Data Definition (Gartner et al)

Characteristics of Big Data: Its
Volume, Velocity and Variety in
combination

Tera-, Peta-, Exa-, Zetta-, Yota- bytes and constantly growing

Velocity
“Traditional” computing in RDBMS 

is not scalable enough. 

We search for “linear scalability”

“Only … structured information 

is not enough” – “95% of produced data in
unstructured”

+ Veracity (IBM) - information uncertainty
+ Time to action ? – Big Data + Event Processing = Fast Data
7

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
8

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Sample – Show the counts of tweets related to Jazoon (using
term jazoon) 

23.10.2013 – 15:45 not
showing #jazoon
http://54.217.159.208:8484/jazoon-restapi/

9

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Velocity
§  Velocity requirement examples:
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 

10

Recommendation Engine
Predictive Analytics
Marketing Campaign Analysis
Customer Retention and Churn Analysis
Social Graph Analysis
Capital Markets Analysis
Risk Management
Rogue Trading
Fraud Detection
Retail Banking
Network Monitoring
Research and Development

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Agenda
1.  Introduction
2.  What is Apache Kafka?
3.  What is Twitter Storm?
4.  Summary

11

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Apache Kafka
•  A distributed publish-subscribe messaging system
•  Designed for processing of real time activity stream data (logs, metrics
collections, social media streams, …)
•  Initially developed at LinkedIn, now part of Apache
•  Does not follow JMS Standards and does not use JMS API
•  Kafka maintains feeds of messages in topics

12

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Sample (#1)
1) Use the Twitter Streaming API (Twitter Horsebird Client) to get all the
tweets containing the term jazoon
Twitter
Stream

13

tweet

Twitter to
Kafka

tweet

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

Kafka
Topic

Create a Filter Stream for
tracking a list of terms
Twitter
Stream

Sample (#1)

tweet

Twitter to
Kafka

tweet

Kafka
Topic

Create a fully configured
Horsebird Client instance

Start the client in
multiple threads

14

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Twitter
Stream

Sample (#1)

tweet

Twitter to
Kafka

tweet

Kafka
Topic

Convert Twitter status
to Avro format

Create the Kafka producer

Send the message
to the Kafka topic
15

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Agenda
1.  Introduction
2.  What is Apache Kafka?
3.  What is Twitter Storm?
4.  Summary

16

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
What is Twitter Storm ?
A platform for doing analysis on streams of data as they come in, so you
can react to data as it happens.
•  A highly distributed real-time computation system
•  Provides general primitives to do real-time computation
•  To simplify working with queues & workers
•  scalable and fault-tolerant
•  complementary to Hadoop
Originated at Backtype, acquired by Twitter in 2011
Open Sourced late 2011
Part of Apache Incubator since September 2013
17

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Hadoop vs. Storm
Hadoop

Storm

Batch processing
Jobs runs to completion
Stateful nodes
Scalable
Guarantees no data loss
Open Source
=> big batch processing
18

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

!=
=

Real-time processing
Topologies run forever
Stateless nodes
Scalable
Guarantees no data loss
Open Source

=> Fast, reactive, real time procesing
Idea: Simplify dealing with queue & workers
Scaling is painful – queue partitioning & worker deploy
Operational overhead – worker failures & queue backups
No guarantees on data processing

19

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
What does Storm provide?
§  At least once message processing
§  Fault-tolerant
§  Horizontal scalability
§  No intermediate queues
§  Less operational overhead
§  „Just works“
§  Broad set of use cases
§  Stream processing
§  Continous computation
§  Distributed RPC

20

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Main Concepts - Streams
Stream
•  an unbounded sequence of tuples
•  core abstraction in Storm
•  Defined with a schema that names the 

fields in the tuple
•  Value must be serializable
•  Every stream has an id

T

T

T

T

T

T

Subscribes: C & D
Emits: Source of
stream A

Source of
stream B

21

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

T

Subscribes: A
Emits: C

Topology
•  graph where each node is a spout or bolt
•  edges indicating which bolt subscribes 

to which streams

T

Subscribes: A
Emits: D

Subscribes: A & B
Emits: -
Main Concepts – Worker, Executor, Task
Worker
•  executes a subset of a topology, may run one or more executors for one or
more components

Executor
•  thread spawned by worker

Task
•  performs the actual data processing

Worker Process
Task

Task

22

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

Task

Task
Main Concepts - Spout
A spout is the component which acts as the source of streams in a
topology
Generally spouts read tuples from an external source and emit them into
the topology
Spouts can emit more than one stream

23

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Main Concepts - Bolt
Bolt is doing the processing of the message
•  filtering, functions, aggregations, joins, talking to databases….
•  Can do simple stream transformations
•  Complex transformations often require multiple steps and therefore
multiple bolts
Bolts can emit one or more streams

24

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Sample (#2)
2) Use a Spout to subscribe to the Kafka Topic to get the related tweets
Twitter
Stream
tweet

Twitter to
Kafka
tweet

Kafka
Topic

25

tweet

Kafka

Spout

tweet

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Kafka
Topic

Sample (#2)

Tweets

Kafka

Spout

tweet

Use existing implementation of the Kafka Spout from storm-kafka project

Create a Kafka configuration

26
2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

Create Kafka spout to connect to the
topic
Sample (#3)
3) Use a bolt to retrieve the hashtags for each tweet
Twitter
Stream
tweet

Twitter to
Kafka
tweet

Kafka
Topic

27

tweet

Kafka

Spout

tweet

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

Hashtag

Splitter

hashtag
Kafka
Topic

Sample (#3)

Tweets

Kafka

Spout

tweet

Hashtag

Splitter

hashtag

Execute is called for each
tuple arriving on the input
stream

Declares the output fields for the component
and is called when bolt is created

28

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Sample (#4)
4) Use a bolt to count the occurrences of hashtags into a Redis NoSQL DB
Twitter
Stream
Tweets

Twitter to
Kafka
Tweets

Kafka
Topic

29

Tweets

Kafka

Spout

tweet

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

Hashtag

Splitter

hashtag

Hashtag

Counter
Sample (#4)

Kafka
Topic

Tweets

Kafka

Spout

tweet

Hashtag

Splitter

hashtag

Hashtag

Counter

Execute is called for each tuple
arriving on the input stream

Prepare is called when bolt
is created

30

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Main Concepts - Topology

Kafka
Topic

Tweets

Kafka

Spout

Hashtag

Splitter
Shuffle

Hashtag

Counter
Fields

Hashtag

Splitter

Hashtag

Counter

Each Spout or Bolt are running N instances in parallel
Shuffle grouping

is random grouping

Fields grouping

is grouped by value, such that equal value results in equal task

All grouping

replicates to all tasks

Global grouping

makes all tuples go to one task

None grouping

makes bolt run in the same thread as bold/spout it subscribes to

Direct grouping

producer (task that emits) controls which consumer will receive

31

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Sample – How does it work ?

My presentation
„Kafka and Storm –
event processing in
realtime“ tomorrow
12:00 at Jazoon
Zurich #jazoon
#storm #kafka CU
there!

Kafka
Topic

32

Kafka

Spout

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

Hashtag

Splitter

Hashtag

Counter

Hashtag

Splitter

Hashtag

Counter
Sample – How does it work ?

My presentation
„Kafka and Storm –
event processing in
realtime“ tomorrow
12:00 at Jazoon
Zurich #jazoon
#storm #kafka CU
there!

Kafka
Topic

33

Kafka

Spout

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

Hashtag

Splitter
Hashtag

Splitter

jazoon
storm
kafka

Hashtag

Counter
Hashtag

Counter
Sample – How does it work ?

My presentation
„Kafka and Storm –
event processing in
realtime“ tomorrow
12:00 at Jazoon
Zurich #jazoon
#storm #kafka CU
there!

Kafka
Topic

34

Kafka

Spout

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

Hashtag

Splitter
Hashtag

Splitter

jazoon
storm
kafka

Hashtag

Counter
Hashtag

Counter

INCR
jazoon

INCR
storm
INCR
kafka

jazoon = 1
storm = 1
kafka = 1
Sample (#5)
5) Setup the topology with the necessary groupings (distribution)
Create Stream called „tweet-stream“
Register Kafka
spout and run by 1
worker
Subscribe to stream „tweetstream“ using shuffle grouping

Run this bolt
by 2 workers
Subscribe to „hashtag-splitter“ using
fields grouping on field „hashtag“
35

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Trident
•  High-Level abstraction on top of storm
•  Makes it easier to build topologies
•  Core data model is the stream
•  Processed as a series of batches
•  Stream is partitioned among nodes in cluster

•  5 kinds of operations in Trident
•  Operations that apply locally to each partition and cause no network
transfer
•  Repartitioning operations that don‘t change the contents
•  Aggregation operations that do network transfer
•  Operations on grouped streams
•  Merges and Joins

36

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Trident
Supports
• 
• 
• 
• 
• 

Joins
Aggregations
Grouping
Functions
Filters

Similar to Hadoop and Pig/Cascading

37

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Trident Concepts - Topology

Kafka
Topic

38

tweet

Kafka

Spout

tweet

Bolt
Hashtag

Splitter

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

hashtag
local

Hashtag

Normalizer

hashtag
groupBy

Bolt
Persistent

Aggregate
Trident Concepts - Function
•  takes in a set of input fields and emits zero or more tuples as output
•  fields of the output tuple are appended to the original input tuple in the
stream
•  If a function emits no tuples, the original input tuple is filtered out
•  Otherwise the input tuple is duplicated for each output tuple

39

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Agenda
1.  Introduction
2.  What is Apache Kafka?
3.  What is Twitter Storm?
4.  Summary

40

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Summary
Kafka

Storm

•  Distributed Scalable Pub/Sub system
for Big Data

•  Express real-time processing naturally

•  Producer => Broker => Consumer of
message topics
•  Persists messages with ability to
rewind
•  Consumer decides what he as
consumed so far

•  Not a Hadoop/MapReduce competitor
•  Supports other languages
•  Hard to debug
•  Object Serialization
•  Didn‘t cover
§  Reliability through guaranteed message
processing
§  Distributed RPC
§  Storm cluster setup and deploy

http://54.217.159.208:8484/jazoon-restapi/
41

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Lambda Architecture

Big Data and Fast Data combined

Precompute

Precomputed
Views
information

All data

Batch
recompute

Incoming
Data

Serving Layer
batch view
batch view
Merge

Batch Layer

Speed Layer
Process stream

Incremented
information

Realtime
increment

query

real time view
real time view

Source: Marz, N. & Warren, J. (2013) Big Data. Manning.

42

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Lambda Architecture

Big Data and Fast Data combined
one possible product/framework mapping

Precompute

Precomputed
Views
information

All data

Batch
recompute

Incoming
Data

batch view
batch view

Speed Layer
Process stream

Incremented
information

Realtime
increment

43

Serving Layer

Merge

Batch Layer

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

real time view
real time view

query
Resources
Sample Code: https://github.com/gschmutz/jazoon-storm-twitter-sample
Twitter Streaming API: https://dev.twitter.com/docs/streaming-apis
Twitter Horsebird Client (HBC): https://github.com/twitter/hbc
Apache Kafka: http://kafka.apache.org/
Storm Website: http://storm-project.net/
Storm Wiki: https://github.com/nathanmarz/storm/wiki
Storm Doc: https://github.com/nathanmarz/storm/wiki/Documentation

44

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Thank You!

BASEL

45

BERN

LAUSANNE

ZÜRICH

DÜSSELDORF

Trivadis AG
Guido Schmutz

guido.schmutz@trivadis.com

FRANKFURT A.M.

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

FREIBURG I.BR.

HAMBURG

MÜNCHEN

STUTTGART

WIEN

Lambda Architecture

Big Data and Fast Data combined
Batch Layer

Serving Layer

Immutable
data

Batch
View

B
Incoming
Data

C

D

A

G

Speed Layer
Data
Stream
E

46

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

Realtime
View
F

Query

Weitere ähnliche Inhalte

Was ist angesagt?

Apache kafka 관리와 모니터링
Apache kafka 관리와 모니터링Apache kafka 관리와 모니터링
Apache kafka 관리와 모니터링JANGWONSEO4
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
Unified Log Processing Architecture
Unified Log Processing ArchitectureUnified Log Processing Architecture
Unified Log Processing ArchitectureGuido Schmutz
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...Simplilearn
 
Event Streaming in the Telco Industry with Apache Kafka® and Confluent
Event Streaming in the Telco Industry with Apache Kafka® and ConfluentEvent Streaming in the Telco Industry with Apache Kafka® and Confluent
Event Streaming in the Telco Industry with Apache Kafka® and Confluentconfluent
 
Streaming architecture patterns
Streaming architecture patternsStreaming architecture patterns
Streaming architecture patternshadooparchbook
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBWilliam LaForest
 
REST-API introduction for developers
REST-API introduction for developersREST-API introduction for developers
REST-API introduction for developersPatrick Savalle
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Edureka!
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...HostedbyConfluent
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveDataWorks Summit
 

Was ist angesagt? (20)

Apache kafka 관리와 모니터링
Apache kafka 관리와 모니터링Apache kafka 관리와 모니터링
Apache kafka 관리와 모니터링
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Unified Log Processing Architecture
Unified Log Processing ArchitectureUnified Log Processing Architecture
Unified Log Processing Architecture
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
Event Streaming in the Telco Industry with Apache Kafka® and Confluent
Event Streaming in the Telco Industry with Apache Kafka® and ConfluentEvent Streaming in the Telco Industry with Apache Kafka® and Confluent
Event Streaming in the Telco Industry with Apache Kafka® and Confluent
 
Streaming architecture patterns
Streaming architecture patternsStreaming architecture patterns
Streaming architecture patterns
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
 
REST-API introduction for developers
REST-API introduction for developersREST-API introduction for developers
REST-API introduction for developers
 
Introduction to Pig
Introduction to PigIntroduction to Pig
Introduction to Pig
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
 
RESTful API - Best Practices
RESTful API - Best PracticesRESTful API - Best Practices
RESTful API - Best Practices
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
 

Ähnlich wie Kafka and Storm for Real-Time Event Processing

Twitter Storm: Ereignisverarbeitung in Echtzeit
Twitter Storm: Ereignisverarbeitung in EchtzeitTwitter Storm: Ereignisverarbeitung in Echtzeit
Twitter Storm: Ereignisverarbeitung in EchtzeitGuido Schmutz
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline DevelopmentTimothy Spann
 
Meetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline DevelopmentMeetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline DevelopmentTimothy Spann
 
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023ssuser73434e
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table NotesTimothy Spann
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureGuido Schmutz
 
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedGuido Schmutz
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Timothy Spann
 
Spark Streaming the Industrial IoT
Spark Streaming the Industrial IoTSpark Streaming the Industrial IoT
Spark Streaming the Industrial IoTJim Haughwout
 
Azure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
Azure tales: a real world CQRS and ES Deep Dive - Andrea SaltarelloAzure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
Azure tales: a real world CQRS and ES Deep Dive - Andrea SaltarelloITCamp
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
INTERFACE by apidays 2023 - Leveraging Event Streaming to Super-Charge your B...
INTERFACE by apidays 2023 - Leveraging Event Streaming to Super-Charge your B...INTERFACE by apidays 2023 - Leveraging Event Streaming to Super-Charge your B...
INTERFACE by apidays 2023 - Leveraging Event Streaming to Super-Charge your B...apidays
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basFlorent Ramiere
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Architecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructureArchitecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructuremattlieber
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...Timothy Spann
 

Ähnlich wie Kafka and Storm for Real-Time Event Processing (20)

Twitter Storm: Ereignisverarbeitung in Echtzeit
Twitter Storm: Ereignisverarbeitung in EchtzeitTwitter Storm: Ereignisverarbeitung in Echtzeit
Twitter Storm: Ereignisverarbeitung in Echtzeit
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
 
Meetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline DevelopmentMeetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline Development
 
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table Notes
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
 
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
 
Spark Streaming the Industrial IoT
Spark Streaming the Industrial IoTSpark Streaming the Industrial IoT
Spark Streaming the Industrial IoT
 
Azure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
Azure tales: a real world CQRS and ES Deep Dive - Andrea SaltarelloAzure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
Azure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
INTERFACE by apidays 2023 - Leveraging Event Streaming to Super-Charge your B...
INTERFACE by apidays 2023 - Leveraging Event Streaming to Super-Charge your B...INTERFACE by apidays 2023 - Leveraging Event Streaming to Super-Charge your B...
INTERFACE by apidays 2023 - Leveraging Event Streaming to Super-Charge your B...
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en bas
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Architecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructureArchitecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructure
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 

Mehr von Guido Schmutz

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as CodeGuido Schmutz
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsGuido Schmutz
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureGuido Schmutz
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaGuido Schmutz
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaGuido Schmutz
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming VisualisationGuido Schmutz
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Guido Schmutz
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 

Mehr von Guido Schmutz (20)

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data Architecture
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using Kafka
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI Architecture
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 

Kürzlich hochgeladen

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Kürzlich hochgeladen (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Kafka and Storm for Real-Time Event Processing

  • 1. Kafka and Storm – event processing in realtime Guido Schmutz
  • 2. WELCOME Kafka and Storm – event processing in realtime Guido Schmutz
 Jazoon 2013 23.10.2013 BASEL 2 BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN

  • 3. Guido Schmutz •  •  Working for Trivadis for more than 16 years Oracle ACE Director for Fusion Middleware and SOA •  •  Co-Author of different books Consultant, Trainer Software Architect for Java, Oracle, SOA, EDA, BigData und FastData •  •  Member of Trivadis Architecture Board Technology Manager @ Trivadis •  More than 20 years of software development 
 experience •  Contact: guido.schmutz@trivadis.com •  •  Blog: http://guidoschmutz.wordpress.com Twitter: gschmutz 3 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 4. Our company Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany and Austria. We offer our services in the following strategic business fields: OPERATION Trivadis Services takes over the interacting operation of your IT systems. 4 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 5. With over 600 specialists and IT experts in your region Hamburg Düsseldorf Frankfurt Stuttgart Freiburg Wien München Basel Brugg Bern Zurich Lausanne 5 5 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 12 Trivadis branches and more than 600 employees   200 Service Level Agreements   Over 4,000 training participants   Research and development budget: CHF 5.0 / EUR 4 million   Financially self-supporting and sustainably profitable   Experience from more than 1,900 projects per year at over 800 customers
  • 6. Agenda 1.  Introduction 2.  What is Apache Kafka? 3.  What is Twitter Storm? 4.  Summary 6 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 7. Big Data Definition (Gartner et al) Characteristics of Big Data: Its Volume, Velocity and Variety in combination Tera-, Peta-, Exa-, Zetta-, Yota- bytes and constantly growing Velocity “Traditional” computing in RDBMS 
 is not scalable enough. 
 We search for “linear scalability” “Only … structured information 
 is not enough” – “95% of produced data in unstructured” + Veracity (IBM) - information uncertainty + Time to action ? – Big Data + Event Processing = Fast Data 7 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 8. 8 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 9. Sample – Show the counts of tweets related to Jazoon (using term jazoon) 
 23.10.2013 – 15:45 not showing #jazoon http://54.217.159.208:8484/jazoon-restapi/ 9 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 10. Velocity §  Velocity requirement examples: §  §  §  §  §  §  §  §  §  §  §  §  10 Recommendation Engine Predictive Analytics Marketing Campaign Analysis Customer Retention and Churn Analysis Social Graph Analysis Capital Markets Analysis Risk Management Rogue Trading Fraud Detection Retail Banking Network Monitoring Research and Development 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 11. Agenda 1.  Introduction 2.  What is Apache Kafka? 3.  What is Twitter Storm? 4.  Summary 11 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 12. Apache Kafka •  A distributed publish-subscribe messaging system •  Designed for processing of real time activity stream data (logs, metrics collections, social media streams, …) •  Initially developed at LinkedIn, now part of Apache •  Does not follow JMS Standards and does not use JMS API •  Kafka maintains feeds of messages in topics 12 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 13. Sample (#1) 1) Use the Twitter Streaming API (Twitter Horsebird Client) to get all the tweets containing the term jazoon Twitter Stream 13 tweet Twitter to Kafka tweet 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 Kafka Topic Create a Filter Stream for tracking a list of terms
  • 14. Twitter Stream Sample (#1) tweet Twitter to Kafka tweet Kafka Topic Create a fully configured Horsebird Client instance Start the client in multiple threads 14 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 15. Twitter Stream Sample (#1) tweet Twitter to Kafka tweet Kafka Topic Convert Twitter status to Avro format Create the Kafka producer Send the message to the Kafka topic 15 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 16. Agenda 1.  Introduction 2.  What is Apache Kafka? 3.  What is Twitter Storm? 4.  Summary 16 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 17. What is Twitter Storm ? A platform for doing analysis on streams of data as they come in, so you can react to data as it happens. •  A highly distributed real-time computation system •  Provides general primitives to do real-time computation •  To simplify working with queues & workers •  scalable and fault-tolerant •  complementary to Hadoop Originated at Backtype, acquired by Twitter in 2011 Open Sourced late 2011 Part of Apache Incubator since September 2013 17 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 18. Hadoop vs. Storm Hadoop Storm Batch processing Jobs runs to completion Stateful nodes Scalable Guarantees no data loss Open Source => big batch processing 18 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 != = Real-time processing Topologies run forever Stateless nodes Scalable Guarantees no data loss Open Source => Fast, reactive, real time procesing
  • 19. Idea: Simplify dealing with queue & workers Scaling is painful – queue partitioning & worker deploy Operational overhead – worker failures & queue backups No guarantees on data processing 19 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 20. What does Storm provide? §  At least once message processing §  Fault-tolerant §  Horizontal scalability §  No intermediate queues §  Less operational overhead §  „Just works“ §  Broad set of use cases §  Stream processing §  Continous computation §  Distributed RPC 20 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 21. Main Concepts - Streams Stream •  an unbounded sequence of tuples •  core abstraction in Storm •  Defined with a schema that names the 
 fields in the tuple •  Value must be serializable •  Every stream has an id T T T T T T Subscribes: C & D Emits: Source of stream A Source of stream B 21 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 T Subscribes: A Emits: C Topology •  graph where each node is a spout or bolt •  edges indicating which bolt subscribes 
 to which streams T Subscribes: A Emits: D Subscribes: A & B Emits: -
  • 22. Main Concepts – Worker, Executor, Task Worker •  executes a subset of a topology, may run one or more executors for one or more components Executor •  thread spawned by worker Task •  performs the actual data processing Worker Process Task Task 22 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 Task Task
  • 23. Main Concepts - Spout A spout is the component which acts as the source of streams in a topology Generally spouts read tuples from an external source and emit them into the topology Spouts can emit more than one stream 23 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 24. Main Concepts - Bolt Bolt is doing the processing of the message •  filtering, functions, aggregations, joins, talking to databases…. •  Can do simple stream transformations •  Complex transformations often require multiple steps and therefore multiple bolts Bolts can emit one or more streams 24 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 25. Sample (#2) 2) Use a Spout to subscribe to the Kafka Topic to get the related tweets Twitter Stream tweet Twitter to Kafka tweet Kafka Topic 25 tweet Kafka
 Spout tweet 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 26. Kafka Topic Sample (#2) Tweets Kafka
 Spout tweet Use existing implementation of the Kafka Spout from storm-kafka project Create a Kafka configuration 26 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 Create Kafka spout to connect to the topic
  • 27. Sample (#3) 3) Use a bolt to retrieve the hashtags for each tweet Twitter Stream tweet Twitter to Kafka tweet Kafka Topic 27 tweet Kafka
 Spout tweet 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 Hashtag
 Splitter hashtag
  • 28. Kafka Topic Sample (#3) Tweets Kafka
 Spout tweet Hashtag
 Splitter hashtag Execute is called for each tuple arriving on the input stream Declares the output fields for the component and is called when bolt is created 28 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 29. Sample (#4) 4) Use a bolt to count the occurrences of hashtags into a Redis NoSQL DB Twitter Stream Tweets Twitter to Kafka Tweets Kafka Topic 29 Tweets Kafka
 Spout tweet 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 Hashtag
 Splitter hashtag Hashtag
 Counter
  • 30. Sample (#4) Kafka Topic Tweets Kafka
 Spout tweet Hashtag
 Splitter hashtag Hashtag
 Counter Execute is called for each tuple arriving on the input stream Prepare is called when bolt is created 30 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 31. Main Concepts - Topology Kafka Topic Tweets Kafka
 Spout Hashtag
 Splitter Shuffle Hashtag
 Counter Fields Hashtag
 Splitter Hashtag
 Counter Each Spout or Bolt are running N instances in parallel Shuffle grouping is random grouping Fields grouping is grouped by value, such that equal value results in equal task All grouping replicates to all tasks Global grouping makes all tuples go to one task None grouping makes bolt run in the same thread as bold/spout it subscribes to Direct grouping producer (task that emits) controls which consumer will receive 31 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 32. Sample – How does it work ? My presentation „Kafka and Storm – event processing in realtime“ tomorrow 12:00 at Jazoon Zurich #jazoon #storm #kafka CU there! Kafka Topic 32 Kafka
 Spout 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 Hashtag
 Splitter Hashtag
 Counter Hashtag
 Splitter Hashtag
 Counter
  • 33. Sample – How does it work ? My presentation „Kafka and Storm – event processing in realtime“ tomorrow 12:00 at Jazoon Zurich #jazoon #storm #kafka CU there! Kafka Topic 33 Kafka
 Spout 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 Hashtag
 Splitter Hashtag
 Splitter jazoon storm kafka Hashtag
 Counter Hashtag
 Counter
  • 34. Sample – How does it work ? My presentation „Kafka and Storm – event processing in realtime“ tomorrow 12:00 at Jazoon Zurich #jazoon #storm #kafka CU there! Kafka Topic 34 Kafka
 Spout 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 Hashtag
 Splitter Hashtag
 Splitter jazoon storm kafka Hashtag
 Counter Hashtag
 Counter INCR jazoon INCR storm INCR kafka jazoon = 1 storm = 1 kafka = 1
  • 35. Sample (#5) 5) Setup the topology with the necessary groupings (distribution) Create Stream called „tweet-stream“ Register Kafka spout and run by 1 worker Subscribe to stream „tweetstream“ using shuffle grouping Run this bolt by 2 workers Subscribe to „hashtag-splitter“ using fields grouping on field „hashtag“ 35 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 36. Trident •  High-Level abstraction on top of storm •  Makes it easier to build topologies •  Core data model is the stream •  Processed as a series of batches •  Stream is partitioned among nodes in cluster •  5 kinds of operations in Trident •  Operations that apply locally to each partition and cause no network transfer •  Repartitioning operations that don‘t change the contents •  Aggregation operations that do network transfer •  Operations on grouped streams •  Merges and Joins 36 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 37. Trident Supports •  •  •  •  •  Joins Aggregations Grouping Functions Filters Similar to Hadoop and Pig/Cascading 37 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 38. Trident Concepts - Topology Kafka Topic 38 tweet Kafka
 Spout tweet Bolt Hashtag
 Splitter 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 hashtag local Hashtag
 Normalizer hashtag groupBy Bolt Persistent
 Aggregate
  • 39. Trident Concepts - Function •  takes in a set of input fields and emits zero or more tuples as output •  fields of the output tuple are appended to the original input tuple in the stream •  If a function emits no tuples, the original input tuple is filtered out •  Otherwise the input tuple is duplicated for each output tuple 39 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 40. Agenda 1.  Introduction 2.  What is Apache Kafka? 3.  What is Twitter Storm? 4.  Summary 40 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 41. Summary Kafka Storm •  Distributed Scalable Pub/Sub system for Big Data •  Express real-time processing naturally •  Producer => Broker => Consumer of message topics •  Persists messages with ability to rewind •  Consumer decides what he as consumed so far •  Not a Hadoop/MapReduce competitor •  Supports other languages •  Hard to debug •  Object Serialization •  Didn‘t cover §  Reliability through guaranteed message processing §  Distributed RPC §  Storm cluster setup and deploy http://54.217.159.208:8484/jazoon-restapi/ 41 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 42. Lambda Architecture
 Big Data and Fast Data combined Precompute Precomputed Views information All data Batch recompute Incoming Data Serving Layer batch view batch view Merge Batch Layer Speed Layer Process stream Incremented information Realtime increment query real time view real time view Source: Marz, N. & Warren, J. (2013) Big Data. Manning. 42 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 43. Lambda Architecture
 Big Data and Fast Data combined one possible product/framework mapping Precompute Precomputed Views information All data Batch recompute Incoming Data batch view batch view Speed Layer Process stream Incremented information Realtime increment 43 Serving Layer Merge Batch Layer 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 real time view real time view query
  • 44. Resources Sample Code: https://github.com/gschmutz/jazoon-storm-twitter-sample Twitter Streaming API: https://dev.twitter.com/docs/streaming-apis Twitter Horsebird Client (HBC): https://github.com/twitter/hbc Apache Kafka: http://kafka.apache.org/ Storm Website: http://storm-project.net/ Storm Wiki: https://github.com/nathanmarz/storm/wiki Storm Doc: https://github.com/nathanmarz/storm/wiki/Documentation 44 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 45. Thank You! BASEL 45 BERN LAUSANNE ZÜRICH DÜSSELDORF Trivadis AG Guido Schmutz
 guido.schmutz@trivadis.com FRANKFURT A.M. 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN

  • 46. Lambda Architecture
 Big Data and Fast Data combined Batch Layer Serving Layer Immutable data Batch View B Incoming Data C D A G Speed Layer Data Stream E 46 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 Realtime View F Query