Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Introduction to Apache Storm - Concept & Example

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Nächste SlideShare
Storm presentation
Storm presentation
Wird geladen in …3
×

Hier ansehen

1 von 21 Anzeige

Introduction to Apache Storm - Concept & Example

Herunterladen, um offline zu lesen

Introduction to Apache Storm:
- Storm Concept: topology, tuple, stream, spout, bolt, stream grouping
- Storm Component: Master and Worker
- Example: GitHub Commit Feed

Introduction to Apache Storm:
- Storm Concept: topology, tuple, stream, spout, bolt, stream grouping
- Storm Component: Master and Worker
- Example: GitHub Commit Feed

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Introduction to Apache Storm - Concept & Example (20)

Anzeige

Aktuellste (20)

Introduction to Apache Storm - Concept & Example

  1. 1. APACHE STORM Viet-Dung TRINH (Bill), 03/2016 Saltlux – Vietnam Development Center
  2. 2. Agenda •  Overview •  Core Storm Concepts •  Components of Storm Cluster •  Example
  3. 3. Overview •  Apache Storm is a free and open source distributed real-time computation system. •  Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. •  Storm is fast (million tuples processed/second/node) •  Can be used with any programming language
  4. 4. Overview (cont) •  Use cases: •  Real-time analytics, •  Online machine learning, •  Continuous computation •  … •  Integration: with any queueing and any database system such as: •  Kafka •  Kestrel •  RabbitMG/ AMQP •  JMS •  Amazon Kinesis
  5. 5. Core Storm Concepts •  Topology •  Tuple •  Stream •  Spout •  Bolt •  Stream grouping
  6. 6. Core Storm Concepts: Topology (cont) •  Topology: is a graph of computation, consits of NODEs and EDGEs. •  Nodes: represent some individual computations. •  Edges: represent the data being passed between nodes.
  7. 7. Core Storm Concepts: Tuple (cont) •  Nodes in topology send data in form of tuples •  Tuple: is ordered list of values, where each value is assigned a name •  Processing of sending a tuple is called emitting tuple
  8. 8. Core Storm Concepts: Stream (cont) •  Stream: is an unbounded sequence of tuples between two nodes in topology. •  A topology can contain any number of streams
  9. 9. Core Storm Concepts: Spout (cont) •  Spout: is the source of stream in topology •  Read data from external data source and emits tuples into topology.
  10. 10. Core Storm Concepts: Bolt (cont) •  Bolt: accepts a tuple from its input stream, performs some computation or transformation – filtering, aggregation, join – on tuple, and optional emits a new tuple(s)
  11. 11. Core Storm Concepts: Stream Grouping •  Defines how tuples are sent between instance of spouts and bolts. •  Two most common groupings: shuffle grouping and fields grouping •  SHUFFLE GROUPING: type of stream grouping where tuples are emitted to bolts at random. •  FIELDS GROUPING: ensures that tuples with the same value for a particular field name are always emitted to the same bolt.
  12. 12. Components of Storm Cluster •  Two kinds of nodes: Master and Worker •  Master node runs daemon called Nimbus •  Worker node runs daemon called Supervisor •  All coordination between Nimbus and Supervisor is done through Zookeeper.
  13. 13. Example: GitHub Commit Feed
  14. 14. Example: GitHub Commit Feed (cont) •  Each commit comes into feed as single string containing COMMIT_ID, followed by a SPACE, followed by EMAIL.
  15. 15. Breaking Down the Problem •  Component: reads from live feed of commits and produces single commit message •  Component: accepts single commit message, extracts the developer’s email from that commit, produces email •  Component: accepts developer’s email and updates in-memory map where key is email and value is number of commits for that email.
  16. 16. Breaking Down the Problem (cont)
  17. 17. Tuples •  Two types of tuple in topology •  COMMIT: contain commit_id and email •  EMAIL: developer email
  18. 18. Spout •  Listen to real-time feed of commits being made to repository
  19. 19. Bolts •  1st Bolt: extracts developer’s email •  2nd Bolt: updates map of emails to commit counts
  20. 20. References [1]. Apache Storm, http://storm.apache.org [2]. Sean T. Allen, Matthew Jankowski, Peter Pathirana, Storm Applied, 2015
  21. 21. Thank you!

×