SlideShare ist ein Scribd-Unternehmen logo
1 von 34
STORM
Buckle up Dorothy !!!
Distributed real-time computation
ABOUT
By Nathan Marz
Backtype => Twitter => Apache
Real-time analytics
WHAT IS IT GOOD FOR?
Online machine learning
Continuous computation
Distributed RPC
ETL (Extract, Transform, Load)
…
No data loss
Fault-tolerantScalable
PROMISES
Robust
VIEW FROM ABOVE
StorageTopology
Stream
Source
Storm Cluster
Pull
(Kafka,*
MQ, …)
Read/Write
PRIMITIVES
Field 1 /
Value 1
Field 2 /
Value 2
Field 3 /
Value 3
Field 4 /
Value 4
Field 5 /
Value 5
Tuple
Tuple Tuple Tuple Tuple
Stream
Topology
Bolt
PRIMITIVES
Spout
Bolt
Spout
Bolt
Bolt
ABSTRACTION
PRIMITIVES
Tuples
Filters
Transformation
Incremental
Distributed
Scalable
Functions
Joins
Chaining streams
Small components
EFFECTS
Spouts
Bolts
CLUSTER
Nimbus Zookeeper Cluster
Worker Node
Executor
Supervisor
Executor
Executor
Worker Node
Executor
Supervisor
Executor
Executor
Worker Node
Executor
Supervisor
Executor
Executor
NIMBUS / NODES
CLUSTER
Small
No state
Communication
State
RobustKill / Restart easy
ZOOKEEPER
No data loss
Fault-tolerantScalable
AS PROMISED?
Robust
GUARANTEES
Message transforms into a tuple tree
Storm tracks tuple tree
Fully processed when tree exhausted
FAILURES
Task died – failed tuples replayed
Acker task died – related tuples
timeout and are replayed
Spout task died – source replays, e.g.
pending messages are placed back on
the queue
WHAT DO I HAVE TO DO?
Inform about new links in tree
Inform when finished with a tuple
Every tuple must be acked or failed
TRIDENT
ANYTHING SIMPLER?
High level abstraction
Stateful persistence primitives
Exactly-once semantics
AS PROMISED?
YES
USER DASHBOARD
PROBLEM
Bad performance
Uses core storage
Pre-compute
Customize
Fast
IDEA
Isolate
Quarterly agg.
ARCHITECTURE
Core
Events
Queue
Kafka
4 Partitions
2 Replicas
Storm
4 Workers
MS SQL
4 Staging
Dashboard
Push
Pull Write
Read
State in source
KAFKA
9
8
7
6
5
4
3
2
1
New
Client
Topic Stacked
Flushed
Client offset
Replicated
Old
Partitioned
Fast
TRANSFORMATION
ORIGINAL
{
id: df45er87c78df,
sender: “Info”,
destination: “39345123456”,
parts: 2,
price: 100,
client: “Demo”,
time: “2014-06-02 14:47:58”,
country: “IT”,
network: “Wind”,
type: “SMS”,
…
}
{
client: “Demo”,
type: “SMS”,
country: “IT”,
network: “Wind”,
bucket: “2014-06-02 14:45:00”,
traffic: 2,
expenses: 200
}
COMPUTED
CODE
TridentState tridentState = topology
.newStream("CoreEvents", buildKafkaSpout())
.parallelismHint(4)
.each(
new Fields("bytes"),
new CoreEventMessageParser(),
new Fields("time", "client", "network", "country", "type", "parts", "price"))
.each(
new Fields("time"),
new QuarterTimeBucket(),
new Fields("bucket"))
.project(new Fields("bucket", "client", "network", "country", "type", "traffic", "expenses“))
.groupBy(new Fields("bucket", "client", "network", "country", "type"))
.persistentAggregate(getStateFactory(),
new Fields("traffic", "expenses"),
new Sum(),
new Fields("trafficExpenses"))
.parallelismHint(8);
PERFORMANCE
1.500
PEAKREGULAR
KAFKA 60.000
4.500 160.000
STORAGE 2.000 10.000
DASHBOARD 1 1
TUNING STORAGE
1st Issue - Storage
Random access – 1.500 w/s limit
Staged approach – 30.000 w/s limit
No locks – isolated
Scalable – each worker it’s stage
Main table indexing nicely
Doesn’t affect reading
STAGED WRITES
Worker 1
Main
Table
Merge
Worker 2
Stage
Table 1
Stage
Table 2
MergeWrite
Write
TUNING TOPOLOGY
2nd Issue - Serialization
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
Raw/s Expanded/s Writes/s
200 KB
1 MB
4 MB
8 MB
16 MB
24 MB
Plateauing
SERIALIZATION
0
200
400
600
800
1,000
1,200
S [s] S [byte] S [% CPU] D [s] D [% CPU]
CSV (Plain)
CSV (Deflate)
CSV (GZip)
Jackson (Plain)
Jackson (GZip)
Jackson Smile
Java Object
Kryo
MEASURE
AXIS
Max spout pending
SQL workers
Kafka fetch speed
DB write speed
Kafka / DB ratio
Capacity
DB batch size
Kafka fetch size
Latency
METRICS
Serialization
…
MONITOR
STORM UI TOPOLOGY
METRICS
GRAPHITE
GOTCHAS
Version 0.9.1
Partially in flux
Kafka integration
Message & topology versioning
Performance tuning
Lambda Architecture
NEXT?
Master
Dataset
Real-time Views
Serving LayerBatch Layer
Speed Layer
New
Data
Query
Query
Batch Views
http://storm.incubator.apache.org
RESOURCES
http://lambda-architecture.net
http://kafka.apache.org
http://www.gimp.org
PRESENTATION TOOLS
http://www.pictaculous.com
http://www.colourlovers.com
http://www.easycalculation.com
http://paletton.com
QUESTIONS?

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (7)

SCaml compiler
SCaml compilerSCaml compiler
SCaml compiler
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Concurrency in Distributed Systems : Leslie Lamport papers
Concurrency in Distributed Systems : Leslie Lamport papersConcurrency in Distributed Systems : Leslie Lamport papers
Concurrency in Distributed Systems : Leslie Lamport papers
 
Clojure
ClojureClojure
Clojure
 
R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R
 
Ns2
Ns2Ns2
Ns2
 
Optimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with TruffleOptimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with Truffle
 

Ähnlich wie Storm overview & integration

Workshop apache camel
Workshop apache camelWorkshop apache camel
Workshop apache camel
Marko Seifert
 

Ähnlich wie Storm overview & integration (20)

Server side JavaScript: going all the way
Server side JavaScript: going all the wayServer side JavaScript: going all the way
Server side JavaScript: going all the way
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
 
Venkat ns2
Venkat ns2Venkat ns2
Venkat ns2
 
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Serverless London 2019   FaaS composition using Kafka and CloudEventsServerless London 2019   FaaS composition using Kafka and CloudEvents
Serverless London 2019 FaaS composition using Kafka and CloudEvents
 
Capacity Planning for Linux Systems
Capacity Planning for Linux SystemsCapacity Planning for Linux Systems
Capacity Planning for Linux Systems
 
Ns network simulator
Ns network simulatorNs network simulator
Ns network simulator
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptx
 
Ns2
Ns2Ns2
Ns2
 
Workshop apache camel
Workshop apache camelWorkshop apache camel
Workshop apache camel
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
 
Ns2
Ns2Ns2
Ns2
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processing
 
dfl
dfldfl
dfl
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Design and Performance Characteristics of Tap-as-a-Service
Design and Performance Characteristics of Tap-as-a-ServiceDesign and Performance Characteristics of Tap-as-a-Service
Design and Performance Characteristics of Tap-as-a-Service
 
Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0
 
Storm is coming
Storm is comingStorm is coming
Storm is coming
 
Overview Of Parallel Development - Ericnel
Overview Of Parallel Development -  EricnelOverview Of Parallel Development -  Ericnel
Overview Of Parallel Development - Ericnel
 
Kafka Streams: Revisiting the decisions of the past (How I could have made it...
Kafka Streams: Revisiting the decisions of the past (How I could have made it...Kafka Streams: Revisiting the decisions of the past (How I could have made it...
Kafka Streams: Revisiting the decisions of the past (How I could have made it...
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Kürzlich hochgeladen (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Storm overview & integration