SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Apache Storm
Parallel Real Time Computation
What’s Storm
• It’s a distributed real time computation system
• It’s free and open source
Storm Applications
• Real time analytics
• Online machine learning
• Distributed RPC
• Others
Storm Qualities
• Broad set of use cases
• Scalable
• Guaranteed no data loss
• Robust / Fault Tolerant
• Programming language agnostic
Storm Architecture
Streams
• A stream is an unbounded sequence of tuples.
• Streams are defined with a schema that names
the fields in the stream’s tuples.
Spouts
• Spouts - a spout is a source of streams for a
given topology.
• It will read data from an external source and emit
them into the topology as tuples.
Bolts
• A bolt is the processing element in the topology.
• Bolts can do simple stream transformations like:
filtering, aggregations, functions, joins, etc.
Topologies
• A topology contains all the logic for the realtime
application.
• A topology is a graph of spouts and bolts that are
connected by stream groupings.
Tasks
• Each spout or bolt executes as many tasks
across the cluster.
• Each task corresponds to one thread of
execution.
• Stream groupings define how to send tuples from
one set of tasks to another set of tasks.
Stream Groupings
• A stream grouping defines for a given bolt which
streams it should receive as input.
• A stream grouping also defines how the stream’s
tuples are partitioned among the bolt tasks.
Shuffle Grouping
• Tuples are randomly distributed across the bolt's
tasks in a way such that each bolt is guaranteed
to get an equal number of tuples.
Fields Grouping
• The stream is partitioned by the fields specified in
the grouping.
• If the stream is grouped by the "user-id" field,
tuples with the same "user-id" will always go to
the same task.
Global Grouping
• The entire stream goes to a single one of the
bolt's tasks. Specifically, it goes to the task with
the lowest id.
Workers
• Topologies execute across one or more worker
processes.
• Each worker process is a physical JVM and
executes a subset of all the tasks for the
topology.
• If the combined parallelism of the topology is 300
and 50 workers are allocated, then each worker
will execute 6 tasks
A Basic Storm
Topology
A (not so) Basic Storm
Topology
Demo
Thanks!

Weitere ähnliche Inhalte

Was ist angesagt?

Enhanced Provenance Model (TAP): Time-aware Provenance for Distributed Systems
Enhanced ProvenanceModel (TAP): Time-awareProvenance for Distributed SystemsEnhanced ProvenanceModel (TAP): Time-awareProvenance for Distributed Systems
Enhanced Provenance Model (TAP): Time-aware Provenance for Distributed Systems Athiq Ahamed
 
Introsort or introspective sort
Introsort or introspective sortIntrosort or introspective sort
Introsort or introspective sortEftykhar Mahmud
 
The Road to Reproducible Computational Research
The Road to Reproducible Computational ResearchThe Road to Reproducible Computational Research
The Road to Reproducible Computational ResearchAndrey Moskalenko
 
FCS 05: A Multi-Ring Method for Efficient Multi-Dimensional Data Lookup in P2...
FCS 05: A Multi-Ring Method for Efficient Multi-Dimensional Data Lookup in P2...FCS 05: A Multi-Ring Method for Efficient Multi-Dimensional Data Lookup in P2...
FCS 05: A Multi-Ring Method for Efficient Multi-Dimensional Data Lookup in P2...James Salter
 
Usage of Moving Average
Usage of Moving AverageUsage of Moving Average
Usage of Moving AverageKwanghee Choi
 
Real-Time Top-R Topic Detection on Twitter with Topic Hijack Filtering
Real-Time Top-R Topic Detection on Twitter with Topic Hijack FilteringReal-Time Top-R Topic Detection on Twitter with Topic Hijack Filtering
Real-Time Top-R Topic Detection on Twitter with Topic Hijack FilteringKohei Hayashi
 
work load characterization
work load characterizationwork load characterization
work load characterizationRaghu Golla
 
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...Daiki Tanaka
 
Using Elastiknn for exact and approximate nearest neighbor search
Using Elastiknn for exact and approximate nearest neighbor searchUsing Elastiknn for exact and approximate nearest neighbor search
Using Elastiknn for exact and approximate nearest neighbor searchFaithWestdorp
 
Seattle Cassandra Meetup - Cassandra 1.2 - Eddie Satterly
Seattle Cassandra Meetup - Cassandra 1.2 - Eddie SatterlySeattle Cassandra Meetup - Cassandra 1.2 - Eddie Satterly
Seattle Cassandra Meetup - Cassandra 1.2 - Eddie Satterlybtoddb
 
Apache Flink Training: DataStream API Part 1 Basic
 Apache Flink Training: DataStream API Part 1 Basic Apache Flink Training: DataStream API Part 1 Basic
Apache Flink Training: DataStream API Part 1 BasicFlink Forward
 
Flink Forward Berlin 2017: Boris Lublinsky, Stavros Kontopoulos - Introducing...
Flink Forward Berlin 2017: Boris Lublinsky, Stavros Kontopoulos - Introducing...Flink Forward Berlin 2017: Boris Lublinsky, Stavros Kontopoulos - Introducing...
Flink Forward Berlin 2017: Boris Lublinsky, Stavros Kontopoulos - Introducing...Flink Forward
 
Effective java item 80 and 81
Effective java   item 80 and 81Effective java   item 80 and 81
Effective java item 80 and 81Isaac Liao
 

Was ist angesagt? (15)

Enhanced Provenance Model (TAP): Time-aware Provenance for Distributed Systems
Enhanced ProvenanceModel (TAP): Time-awareProvenance for Distributed SystemsEnhanced ProvenanceModel (TAP): Time-awareProvenance for Distributed Systems
Enhanced Provenance Model (TAP): Time-aware Provenance for Distributed Systems
 
Introsort or introspective sort
Introsort or introspective sortIntrosort or introspective sort
Introsort or introspective sort
 
The Road to Reproducible Computational Research
The Road to Reproducible Computational ResearchThe Road to Reproducible Computational Research
The Road to Reproducible Computational Research
 
FCS 05: A Multi-Ring Method for Efficient Multi-Dimensional Data Lookup in P2...
FCS 05: A Multi-Ring Method for Efficient Multi-Dimensional Data Lookup in P2...FCS 05: A Multi-Ring Method for Efficient Multi-Dimensional Data Lookup in P2...
FCS 05: A Multi-Ring Method for Efficient Multi-Dimensional Data Lookup in P2...
 
FlowSim_presentation
FlowSim_presentationFlowSim_presentation
FlowSim_presentation
 
Usage of Moving Average
Usage of Moving AverageUsage of Moving Average
Usage of Moving Average
 
Real-Time Top-R Topic Detection on Twitter with Topic Hijack Filtering
Real-Time Top-R Topic Detection on Twitter with Topic Hijack FilteringReal-Time Top-R Topic Detection on Twitter with Topic Hijack Filtering
Real-Time Top-R Topic Detection on Twitter with Topic Hijack Filtering
 
work load characterization
work load characterizationwork load characterization
work load characterization
 
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
 
Using Elastiknn for exact and approximate nearest neighbor search
Using Elastiknn for exact and approximate nearest neighbor searchUsing Elastiknn for exact and approximate nearest neighbor search
Using Elastiknn for exact and approximate nearest neighbor search
 
Seattle Cassandra Meetup - Cassandra 1.2 - Eddie Satterly
Seattle Cassandra Meetup - Cassandra 1.2 - Eddie SatterlySeattle Cassandra Meetup - Cassandra 1.2 - Eddie Satterly
Seattle Cassandra Meetup - Cassandra 1.2 - Eddie Satterly
 
Apache Flink Training: DataStream API Part 1 Basic
 Apache Flink Training: DataStream API Part 1 Basic Apache Flink Training: DataStream API Part 1 Basic
Apache Flink Training: DataStream API Part 1 Basic
 
Distributed systems scheduling
Distributed systems schedulingDistributed systems scheduling
Distributed systems scheduling
 
Flink Forward Berlin 2017: Boris Lublinsky, Stavros Kontopoulos - Introducing...
Flink Forward Berlin 2017: Boris Lublinsky, Stavros Kontopoulos - Introducing...Flink Forward Berlin 2017: Boris Lublinsky, Stavros Kontopoulos - Introducing...
Flink Forward Berlin 2017: Boris Lublinsky, Stavros Kontopoulos - Introducing...
 
Effective java item 80 and 81
Effective java   item 80 and 81Effective java   item 80 and 81
Effective java item 80 and 81
 

Ähnlich wie Apache Storm Basics

Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormMd. Shamsur Rahim
 
Cleveland HUG - Storm
Cleveland HUG - StormCleveland HUG - Storm
Cleveland HUG - Stormjustinjleet
 
Introduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleIntroduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleDung Ngua
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormDavorin Vukelic
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time ComputationSonal Raj
 
Coding For Cores - C# Way
Coding For Cores - C# WayCoding For Cores - C# Way
Coding For Cores - C# WayBishnu Rawal
 
Parallel Computing - Lec 5
Parallel Computing - Lec 5Parallel Computing - Lec 5
Parallel Computing - Lec 5Shah Zaib
 
chapter4-processes nd processors in DS.ppt
chapter4-processes nd processors in DS.pptchapter4-processes nd processors in DS.ppt
chapter4-processes nd processors in DS.pptaakarshsiwani1
 
Slide #2: How to Setup Apache STROM
Slide #2: How to Setup Apache STROMSlide #2: How to Setup Apache STROM
Slide #2: How to Setup Apache STROMMd. Shamsur Rahim
 
Slide #2: Setup Apache Storm
Slide #2: Setup Apache StormSlide #2: Setup Apache Storm
Slide #2: Setup Apache StormMd. Shamsur Rahim
 
Storm presentation
Storm presentationStorm presentation
Storm presentationShyam Raj
 

Ähnlich wie Apache Storm Basics (20)

1 storm-intro
1 storm-intro1 storm-intro
1 storm-intro
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
 
Storm
StormStorm
Storm
 
Cleveland HUG - Storm
Cleveland HUG - StormCleveland HUG - Storm
Cleveland HUG - Storm
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Introduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleIntroduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & Example
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Cc module 3.pptx
Cc module 3.pptxCc module 3.pptx
Cc module 3.pptx
 
Streams in Java 8
Streams in Java 8Streams in Java 8
Streams in Java 8
 
Apache Storm Internals
Apache Storm InternalsApache Storm Internals
Apache Storm Internals
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
 
Coding For Cores - C# Way
Coding For Cores - C# WayCoding For Cores - C# Way
Coding For Cores - C# Way
 
Parallel Computing - Lec 5
Parallel Computing - Lec 5Parallel Computing - Lec 5
Parallel Computing - Lec 5
 
chapter4-processes nd processors in DS.ppt
chapter4-processes nd processors in DS.pptchapter4-processes nd processors in DS.ppt
chapter4-processes nd processors in DS.ppt
 
Slide #2: How to Setup Apache STROM
Slide #2: How to Setup Apache STROMSlide #2: How to Setup Apache STROM
Slide #2: How to Setup Apache STROM
 
Slide #2: Setup Apache Storm
Slide #2: Setup Apache StormSlide #2: Setup Apache Storm
Slide #2: Setup Apache Storm
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
 

Mehr von João Paulo Leonidas Fernandes Dias da Silva (7)

Apache spark intro
Apache spark introApache spark intro
Apache spark intro
 
Kafka basics
Kafka basicsKafka basics
Kafka basics
 
Query driven development
Query driven developmentQuery driven development
Query driven development
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
OpenCL Heterogeneous Parallel Computing
OpenCL Heterogeneous Parallel ComputingOpenCL Heterogeneous Parallel Computing
OpenCL Heterogeneous Parallel Computing
 
Unit testing basics
Unit testing basicsUnit testing basics
Unit testing basics
 
Qcon Rio 2015 - Data Lakes Workshop
Qcon Rio 2015 - Data Lakes WorkshopQcon Rio 2015 - Data Lakes Workshop
Qcon Rio 2015 - Data Lakes Workshop
 

Kürzlich hochgeladen

Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 

Kürzlich hochgeladen (20)

Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 

Apache Storm Basics

  • 1. Apache Storm Parallel Real Time Computation
  • 2. What’s Storm • It’s a distributed real time computation system • It’s free and open source
  • 3. Storm Applications • Real time analytics • Online machine learning • Distributed RPC • Others
  • 4. Storm Qualities • Broad set of use cases • Scalable • Guaranteed no data loss • Robust / Fault Tolerant • Programming language agnostic
  • 6. Streams • A stream is an unbounded sequence of tuples. • Streams are defined with a schema that names the fields in the stream’s tuples.
  • 7. Spouts • Spouts - a spout is a source of streams for a given topology. • It will read data from an external source and emit them into the topology as tuples.
  • 8. Bolts • A bolt is the processing element in the topology. • Bolts can do simple stream transformations like: filtering, aggregations, functions, joins, etc.
  • 9. Topologies • A topology contains all the logic for the realtime application. • A topology is a graph of spouts and bolts that are connected by stream groupings.
  • 10. Tasks • Each spout or bolt executes as many tasks across the cluster. • Each task corresponds to one thread of execution. • Stream groupings define how to send tuples from one set of tasks to another set of tasks.
  • 11. Stream Groupings • A stream grouping defines for a given bolt which streams it should receive as input. • A stream grouping also defines how the stream’s tuples are partitioned among the bolt tasks.
  • 12. Shuffle Grouping • Tuples are randomly distributed across the bolt's tasks in a way such that each bolt is guaranteed to get an equal number of tuples.
  • 13. Fields Grouping • The stream is partitioned by the fields specified in the grouping. • If the stream is grouped by the "user-id" field, tuples with the same "user-id" will always go to the same task.
  • 14. Global Grouping • The entire stream goes to a single one of the bolt's tasks. Specifically, it goes to the task with the lowest id.
  • 15. Workers • Topologies execute across one or more worker processes. • Each worker process is a physical JVM and executes a subset of all the tasks for the topology. • If the combined parallelism of the topology is 300 and 50 workers are allocated, then each worker will execute 6 tasks
  • 17. A (not so) Basic Storm Topology
  • 18. Demo