SlideShare ist ein Scribd-Unternehmen logo
1 von 34
STORM
Buckle up Dorothy !!!
Distributed real-time computation
ABOUT
By Nathan Marz
Backtype => Twitter => Apache
Real-time analytics
WHAT IS IT GOOD FOR?
Online machine learning
Continuous computation
Distributed RPC
ETL (Extract, Transform, Load)
…
No data loss
Fault-tolerantScalable
PROMISES
Robust
VIEW FROM ABOVE
StorageTopology
Stream
Source
Storm Cluster
Pull
(Kafka,*
MQ, …)
Read/Write
PRIMITIVES
Field 1 /
Value 1
Field 2 /
Value 2
Field 3 /
Value 3
Field 4 /
Value 4
Field 5 /
Value 5
Tuple
Tuple Tuple Tuple Tuple
Stream
Topology
Bolt
PRIMITIVES
Spout
Bolt
Spout
Bolt
Bolt
ABSTRACTION
PRIMITIVES
Tuples
Filters
Transformation
Incremental
Distributed
Scalable
Functions
Joins
Chaining streams
Small components
EFFECTS
Spouts
Bolts
CLUSTER
Nimbus Zookeeper Cluster
Worker Node
Executor
Supervisor
Executor
Executor
Worker Node
Executor
Supervisor
Executor
Executor
Worker Node
Executor
Supervisor
Executor
Executor
NIMBUS / NODES
CLUSTER
Small
No state
Communication
State
RobustKill / Restart easy
ZOOKEEPER
No data loss
Fault-tolerantScalable
AS PROMISED?
Robust
GUARANTEES
Message transforms into a tuple tree
Storm tracks tuple tree
Fully processed when tree exhausted
FAILURES
Task died – failed tuples replayed
Acker task died – related tuples
timeout and are replayed
Spout task died – source replays, e.g.
pending messages are placed back on
the queue
WHAT DO I HAVE TO DO?
Inform about new links in tree
Inform when finished with a tuple
Every tuple must be acked or failed
TRIDENT
ANYTHING SIMPLER?
High level abstraction
Stateful persistence primitives
Exactly-once semantics
AS PROMISED?
YES
USER DASHBOARD
PROBLEM
Bad performance
Uses core storage
Pre-compute
Customize
Fast
IDEA
Isolate
Quarterly agg.
ARCHITECTURE
Core
Events
Queue
Kafka
4 Partitions
2 Replicas
Storm
4 Workers
MS SQL
4 Staging
Dashboard
Push
Pull Write
Read
State in source
KAFKA
9
8
7
6
5
4
3
2
1
New
Client
Topic Stacked
Flushed
Client offset
Replicated
Old
Partitioned
Fast
TRANSFORMATION
ORIGINAL
{
id: df45er87c78df,
sender: “Info”,
destination: “39345123456”,
parts: 2,
price: 100,
client: “Demo”,
time: “2014-06-02 14:47:58”,
country: “IT”,
network: “Wind”,
type: “SMS”,
…
}
{
client: “Demo”,
type: “SMS”,
country: “IT”,
network: “Wind”,
bucket: “2014-06-02 14:45:00”,
traffic: 2,
expenses: 200
}
COMPUTED
CODE
TridentState tridentState = topology
.newStream("CoreEvents", buildKafkaSpout())
.parallelismHint(4)
.each(
new Fields("bytes"),
new CoreEventMessageParser(),
new Fields("time", "client", "network", "country", "type", "parts", "price"))
.each(
new Fields("time"),
new QuarterTimeBucket(),
new Fields("bucket"))
.project(new Fields("bucket", "client", "network", "country", "type", "traffic", "expenses“))
.groupBy(new Fields("bucket", "client", "network", "country", "type"))
.persistentAggregate(getStateFactory(),
new Fields("traffic", "expenses"),
new Sum(),
new Fields("trafficExpenses"))
.parallelismHint(8);
PERFORMANCE
1.500
PEAKREGULAR
KAFKA 60.000
4.500 160.000
STORAGE 2.000 10.000
DASHBOARD 1 1
TUNING STORAGE
1st Issue - Storage
Random access – 1.500 w/s limit
Staged approach – 30.000 w/s limit
No locks – isolated
Scalable – each worker it’s stage
Main table indexing nicely
Doesn’t affect reading
STAGED WRITES
Worker 1
Main
Table
Merge
Worker 2
Stage
Table 1
Stage
Table 2
MergeWrite
Write
TUNING TOPOLOGY
2nd Issue - Serialization
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
Raw/s Expanded/s Writes/s
200 KB
1 MB
4 MB
8 MB
16 MB
24 MB
Plateauing
SERIALIZATION
0
200
400
600
800
1,000
1,200
S [s] S [byte] S [% CPU] D [s] D [% CPU]
CSV (Plain)
CSV (Deflate)
CSV (GZip)
Jackson (Plain)
Jackson (GZip)
Jackson Smile
Java Object
Kryo
MEASURE
AXIS
Max spout pending
SQL workers
Kafka fetch speed
DB write speed
Kafka / DB ratio
Capacity
DB batch size
Kafka fetch size
Latency
METRICS
Serialization
…
MONITOR
STORM UI TOPOLOGY
METRICS
GRAPHITE
GOTCHAS
Version 0.9.1
Partially in flux
Kafka integration
Message & topology versioning
Performance tuning
Lambda Architecture
NEXT?
Master
Dataset
Real-time Views
Serving LayerBatch Layer
Speed Layer
New
Data
Query
Query
Batch Views
http://storm.incubator.apache.org
RESOURCES
http://lambda-architecture.net
http://kafka.apache.org
http://www.gimp.org
PRESENTATION TOOLS
http://www.pictaculous.com
http://www.colourlovers.com
http://www.easycalculation.com
http://paletton.com
QUESTIONS?

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (7)

SCaml compiler
SCaml compilerSCaml compiler
SCaml compiler
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Concurrency in Distributed Systems : Leslie Lamport papers
Concurrency in Distributed Systems : Leslie Lamport papersConcurrency in Distributed Systems : Leslie Lamport papers
Concurrency in Distributed Systems : Leslie Lamport papers
 
Clojure
ClojureClojure
Clojure
 
R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R
 
Ns2
Ns2Ns2
Ns2
 
Optimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with TruffleOptimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with Truffle
 

Ähnlich wie Storm overview & integration

Workshop apache camel
Workshop apache camelWorkshop apache camel
Workshop apache camel
Marko Seifert
 

Ähnlich wie Storm overview & integration (20)

Server side JavaScript: going all the way
Server side JavaScript: going all the wayServer side JavaScript: going all the way
Server side JavaScript: going all the way
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
 
Venkat ns2
Venkat ns2Venkat ns2
Venkat ns2
 
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Serverless London 2019   FaaS composition using Kafka and CloudEventsServerless London 2019   FaaS composition using Kafka and CloudEvents
Serverless London 2019 FaaS composition using Kafka and CloudEvents
 
Capacity Planning for Linux Systems
Capacity Planning for Linux SystemsCapacity Planning for Linux Systems
Capacity Planning for Linux Systems
 
Ns network simulator
Ns network simulatorNs network simulator
Ns network simulator
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptx
 
Ns2
Ns2Ns2
Ns2
 
Workshop apache camel
Workshop apache camelWorkshop apache camel
Workshop apache camel
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
 
Ns2
Ns2Ns2
Ns2
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processing
 
dfl
dfldfl
dfl
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Design and Performance Characteristics of Tap-as-a-Service
Design and Performance Characteristics of Tap-as-a-ServiceDesign and Performance Characteristics of Tap-as-a-Service
Design and Performance Characteristics of Tap-as-a-Service
 
Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0
 
Storm is coming
Storm is comingStorm is coming
Storm is coming
 
Overview Of Parallel Development - Ericnel
Overview Of Parallel Development -  EricnelOverview Of Parallel Development -  Ericnel
Overview Of Parallel Development - Ericnel
 
Kafka Streams: Revisiting the decisions of the past (How I could have made it...
Kafka Streams: Revisiting the decisions of the past (How I could have made it...Kafka Streams: Revisiting the decisions of the past (How I could have made it...
Kafka Streams: Revisiting the decisions of the past (How I could have made it...
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Storm overview & integration