SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
APACHE STORM
Viet-Dung TRINH (Bill), 03/2016
Saltlux – Vietnam Development Center
Agenda
•  Overview
•  Core Storm Concepts
•  Components of Storm Cluster
•  Example
Overview
•  Apache Storm is a free and open source distributed real-time
computation system.
•  Storm makes it easy to reliably process unbounded streams of
data, doing for real-time processing what Hadoop did for batch
processing.
•  Storm is fast (million tuples processed/second/node)
•  Can be used with any programming language
Overview (cont)
•  Use cases:
•  Real-time analytics,
•  Online machine learning,
•  Continuous computation
•  …
•  Integration: with any queueing and any database system
such as:
•  Kafka
•  Kestrel
•  RabbitMG/ AMQP
•  JMS
•  Amazon Kinesis
Core Storm Concepts
•  Topology
•  Tuple
•  Stream
•  Spout
•  Bolt
•  Stream grouping
Core Storm Concepts: Topology (cont)
•  Topology: is a graph of computation, consits of NODEs
and EDGEs.
•  Nodes: represent some individual computations.
•  Edges: represent the data being passed between nodes.
Core Storm Concepts: Tuple (cont)
•  Nodes in topology send data in form of tuples
•  Tuple: is ordered list of values, where each value is
assigned a name
•  Processing of sending a tuple is called emitting tuple
Core Storm Concepts: Stream (cont)
•  Stream: is an unbounded sequence of tuples between two
nodes in topology.
•  A topology can contain any number of streams
Core Storm Concepts: Spout (cont)
•  Spout: is the source of stream in topology
•  Read data from external data source and emits tuples into
topology.
Core Storm Concepts: Bolt (cont)
•  Bolt: accepts a tuple from its input stream, performs some
computation or transformation – filtering, aggregation, join
– on tuple, and optional emits a new tuple(s)
Core Storm Concepts: Stream Grouping
•  Defines how tuples are sent between instance of spouts
and bolts.
•  Two most common groupings: shuffle grouping and fields
grouping
•  SHUFFLE GROUPING: type of stream grouping where
tuples are emitted to bolts at random.
•  FIELDS GROUPING: ensures that tuples with the same
value for a particular field name are always emitted to the
same bolt.
Components of Storm Cluster
•  Two kinds of nodes: Master and Worker
•  Master node runs daemon called Nimbus
•  Worker node runs daemon called Supervisor
•  All coordination between Nimbus and Supervisor is done
through Zookeeper.
Example: GitHub Commit Feed
Example: GitHub Commit Feed (cont)
•  Each commit comes into feed as single string containing
COMMIT_ID, followed by a SPACE, followed by EMAIL.
Breaking Down the Problem
•  Component: reads from live feed of
commits and produces single
commit message
•  Component: accepts single commit
message, extracts the developer’s
email from that commit, produces
email
•  Component: accepts developer’s
email and updates in-memory map
where key is email and value is
number of commits for that email.
Breaking Down the Problem (cont)
Tuples
•  Two types of tuple in
topology
•  COMMIT: contain
commit_id and email
•  EMAIL: developer
email
Spout
•  Listen to real-time feed of
commits being made to
repository
Bolts
•  1st Bolt: extracts developer’s
email
•  2nd Bolt: updates map of
emails to commit counts
References
[1]. Apache Storm, http://storm.apache.org
[2]. Sean T. Allen, Matthew Jankowski, Peter Pathirana,
Storm Applied, 2015
Thank you!

Weitere ähnliche Inhalte

Was ist angesagt?

Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormMd. Shamsur Rahim
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormEugene Dvorkin
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationUday Vakalapudi
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignMichael Noll
 
Apache Storm Concepts
Apache Storm ConceptsApache Storm Concepts
Apache Storm ConceptsAndré Dias
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
 
Spark vs storm
Spark vs stormSpark vs storm
Spark vs stormTrong Ton
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Stormviirya
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Robert Evans
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.DECK36
 
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter StormUwe Printz
 
Storm: Distributed and fault tolerant realtime computation
Storm: Distributed and fault tolerant realtime computationStorm: Distributed and fault tolerant realtime computation
Storm: Distributed and fault tolerant realtime computationFerran Galí Reniu
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceP. Taylor Goetz
 

Was ist angesagt? (20)

Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
 
Storm
StormStorm
Storm
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
 
Apache Storm Internals
Apache Storm InternalsApache Storm Internals
Apache Storm Internals
 
Storm and Cassandra
Storm and Cassandra Storm and Cassandra
Storm and Cassandra
 
STORM
STORMSTORM
STORM
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
 
Apache Storm Concepts
Apache Storm ConceptsApache Storm Concepts
Apache Storm Concepts
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Spark vs storm
Spark vs stormSpark vs storm
Spark vs storm
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.
 
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter Storm
 
Storm: Distributed and fault tolerant realtime computation
Storm: Distributed and fault tolerant realtime computationStorm: Distributed and fault tolerant realtime computation
Storm: Distributed and fault tolerant realtime computation
 
Storm Anatomy
Storm AnatomyStorm Anatomy
Storm Anatomy
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 

Ähnlich wie APACHE STORM: A FREE AND OPEN SOURCE DISTRIBUTED REAL-TIME COMPUTATION SYSTEM

Cleveland HUG - Storm
Cleveland HUG - StormCleveland HUG - Storm
Cleveland HUG - Stormjustinjleet
 
Ruby Microservices with RabbitMQ
Ruby Microservices with RabbitMQRuby Microservices with RabbitMQ
Ruby Microservices with RabbitMQZoran Majstorovic
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormDavorin Vukelic
 
Low Latency Streaming Data Processing in Hadoop
Low Latency Streaming Data Processing in HadoopLow Latency Streaming Data Processing in Hadoop
Low Latency Streaming Data Processing in HadoopInSemble
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesOleksii Diagiliev
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationnathanmarz
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresCloudLightning
 
Pune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPrashant Rane
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewPoo Kuan Hoong
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlowDarshan Patel
 
A Retasking Framework For Wireless Sensor Networks
A Retasking Framework For Wireless Sensor NetworksA Retasking Framework For Wireless Sensor Networks
A Retasking Framework For Wireless Sensor NetworksMichele Weigle
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Bryan Bende
 
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Amazon Web Services
 

Ähnlich wie APACHE STORM: A FREE AND OPEN SOURCE DISTRIBUTED REAL-TIME COMPUTATION SYSTEM (20)

Cleveland HUG - Storm
Cleveland HUG - StormCleveland HUG - Storm
Cleveland HUG - Storm
 
Ruby Microservices with RabbitMQ
Ruby Microservices with RabbitMQRuby Microservices with RabbitMQ
Ruby Microservices with RabbitMQ
 
1 storm-intro
1 storm-intro1 storm-intro
1 storm-intro
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Low Latency Streaming Data Processing in Hadoop
Low Latency Streaming Data Processing in HadoopLow Latency Streaming Data Processing in Hadoop
Low Latency Streaming Data Processing in Hadoop
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud Infrastructures
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Apache Storm Basics
Apache Storm BasicsApache Storm Basics
Apache Storm Basics
 
Pune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCD
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Storm
StormStorm
Storm
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An Overview
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlow
 
A Retasking Framework For Wireless Sensor Networks
A Retasking Framework For Wireless Sensor NetworksA Retasking Framework For Wireless Sensor Networks
A Retasking Framework For Wireless Sensor Networks
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
 
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
 

Kürzlich hochgeladen

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Kürzlich hochgeladen (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

APACHE STORM: A FREE AND OPEN SOURCE DISTRIBUTED REAL-TIME COMPUTATION SYSTEM

  • 1. APACHE STORM Viet-Dung TRINH (Bill), 03/2016 Saltlux – Vietnam Development Center
  • 2. Agenda •  Overview •  Core Storm Concepts •  Components of Storm Cluster •  Example
  • 3. Overview •  Apache Storm is a free and open source distributed real-time computation system. •  Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. •  Storm is fast (million tuples processed/second/node) •  Can be used with any programming language
  • 4. Overview (cont) •  Use cases: •  Real-time analytics, •  Online machine learning, •  Continuous computation •  … •  Integration: with any queueing and any database system such as: •  Kafka •  Kestrel •  RabbitMG/ AMQP •  JMS •  Amazon Kinesis
  • 5. Core Storm Concepts •  Topology •  Tuple •  Stream •  Spout •  Bolt •  Stream grouping
  • 6. Core Storm Concepts: Topology (cont) •  Topology: is a graph of computation, consits of NODEs and EDGEs. •  Nodes: represent some individual computations. •  Edges: represent the data being passed between nodes.
  • 7. Core Storm Concepts: Tuple (cont) •  Nodes in topology send data in form of tuples •  Tuple: is ordered list of values, where each value is assigned a name •  Processing of sending a tuple is called emitting tuple
  • 8. Core Storm Concepts: Stream (cont) •  Stream: is an unbounded sequence of tuples between two nodes in topology. •  A topology can contain any number of streams
  • 9. Core Storm Concepts: Spout (cont) •  Spout: is the source of stream in topology •  Read data from external data source and emits tuples into topology.
  • 10. Core Storm Concepts: Bolt (cont) •  Bolt: accepts a tuple from its input stream, performs some computation or transformation – filtering, aggregation, join – on tuple, and optional emits a new tuple(s)
  • 11. Core Storm Concepts: Stream Grouping •  Defines how tuples are sent between instance of spouts and bolts. •  Two most common groupings: shuffle grouping and fields grouping •  SHUFFLE GROUPING: type of stream grouping where tuples are emitted to bolts at random. •  FIELDS GROUPING: ensures that tuples with the same value for a particular field name are always emitted to the same bolt.
  • 12. Components of Storm Cluster •  Two kinds of nodes: Master and Worker •  Master node runs daemon called Nimbus •  Worker node runs daemon called Supervisor •  All coordination between Nimbus and Supervisor is done through Zookeeper.
  • 14. Example: GitHub Commit Feed (cont) •  Each commit comes into feed as single string containing COMMIT_ID, followed by a SPACE, followed by EMAIL.
  • 15. Breaking Down the Problem •  Component: reads from live feed of commits and produces single commit message •  Component: accepts single commit message, extracts the developer’s email from that commit, produces email •  Component: accepts developer’s email and updates in-memory map where key is email and value is number of commits for that email.
  • 16. Breaking Down the Problem (cont)
  • 17. Tuples •  Two types of tuple in topology •  COMMIT: contain commit_id and email •  EMAIL: developer email
  • 18. Spout •  Listen to real-time feed of commits being made to repository
  • 19. Bolts •  1st Bolt: extracts developer’s email •  2nd Bolt: updates map of emails to commit counts
  • 20. References [1]. Apache Storm, http://storm.apache.org [2]. Sean T. Allen, Matthew Jankowski, Peter Pathirana, Storm Applied, 2015