Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 33 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Storm (20)

Anzeige

Aktuellste (20)

Storm

  1. 1. Course Instructor : Dr.Zarifzadeh Presented By : Pouyan Rezazadeh, Ali Rezaie Apache Storm
  2. 2. 2 Introduction Apache Storm Hadoop and related technologies have made it possible to store and process data at large scales. Unfortunately, these data processing technologies are not realtime systems. Hadoop does batch processing instead of realtime processing.
  3. 3. 3 Introduction Batch processing Processing jobs in batch Batch processing jobs can take hours E.g. billing system Realtime processing Processing jobs one by one Processing jobs immediately E.g. airline system Apache Storm
  4. 4. 4 Introduction Realtime data processing at massive scale is becoming more and more of a requirement for businesses. The lack of a "Hadoop of realtime" has become the biggest hole in the data processing ecosystem. There's no hack that will turn Hadoop into a realtime system. Solution Apache Storm
  5. 5. 5 Apache Storm A distributed realtime computation system Founded in 2011 Implemented in Clojure (a dialect of Lisp), some Java Apache Storm
  6. 6. 6 Advantages Free, simple and open source Can be used with any programming language Very fast Scalable Fault-tolerant Guarantees your data will be processed Integrates with any database technology Extremely robust Apache Storm
  7. 7. 7 Storm Use Cases Apache Storm And too many others …
  8. 8. 8 Storm vs Hadoop A Storm cluster is superficially similar to a Hadoop cluster. Hadoop runs "MapReduce jobs", while Storm runs "topologies". A MapReduce job eventually finishes, whereas a topology processes messages forever (or until you kill it). Apache Storm
  9. 9. 9 Spouts and Bolts Apache Storm Spouts Bolts
  10. 10. 10 Spouts and Bolts A stream is an unbounded sequence of tuples. A spout is a source of streams. Apache Storm Spout 2 Bolt 3 Bolt 2 Bolt 4 Bolt 1 Spout 1
  11. 11. 11 Spouts and Bolts For example, a spout may read tuples off of a queue and emit them as a stream. Apache Storm Spout 2 Bolt 3 Bolt 2 Bolt 4 Bolt 1 Spout 1
  12. 12. 12 Spouts and Bolts A bolt consumes any number of input streams, does some processing, and possibly emits new streams. Apache Storm Spout 2 Bolt 3 Bolt 2 Bolt 4 Bolt 1 Spout 1
  13. 13. 13 Spouts and Bolts Each node (spout or bolt) in a Storm topology executes in parallel. Apache Storm Spout 2 Bolt 3 Bolt 2 Bolt 4 Bolt 1 Spout 1
  14. 14. 14 Architecture Apache Storm A machine in a storm cluster may run one or more worker processes. Each topology has one or more worker processes. Each worker process runs executors (threads) for a specific topology. Each executor runs one or more tasks of the same component(spout or bolt). Worker Process Task Task Task Task executor
  15. 15. 15 Architecture Apache Storm Supervisor Nimbus ZooKeeper ZooKeeper ZooKeeper Supervisor Supervisor Supervisor Supervisor Hadoop v1 Storm JobTracker Nimbus (only 1)  distributes code around cluster  assigns tasks to machines/supervisors  failure monitoring TaskTracker Supervisor (many)  listens for work assigned to its machine  starts and stops worker processes as necessary based on Nimbus ZooKeeper  coordination between Nimbus and the Supervisors
  16. 16. 16 Architecture The Nimbus and Supervisor are stateless. All state is kept in Zookeeper. 1 ZK instance per machine When the Nimbus or Supervisor fails, they'll start back up like nothing happened. Apache Storm storm jar all-my-code.jar org.apache.storm.MyTopology arg1 arg2
  17. 17. 17 Architecture A running topology consists of many worker processes spread across many machines. Apache Storm Topology Worker Process Task Task Task Task TaskTask Worker Process Task Task Task Task TaskTask
  18. 18. 18 Topology With Tasks in Details Apache Storm
  19. 19. 19 Stream Groupings Shuffle grouping: Randomized round-robin Fields grouping: all Tuples with the same field value(s) are always routed to the same task Direct grouping: producer of the tuple decides which task of the consumer will receive the tuple Apache Storm
  20. 20. 20 A Sample Code of Configuring Apache Storm TopologyBuilder topologyBuilder = new TopologyBuilder();
  21. 21. 21 Fault Tolerance Apache Storm Workers heartbeat back to Nimbus via ZooKeeper.
  22. 22. 22 Fault Tolerance Apache Storm When a worker dies, the supervisor will restart it.
  23. 23. 23 Fault Tolerance Apache Storm If it continuously fails on startup and is unable to heartbeat to Nimbus, Nimbus will reschedule the worker.
  24. 24. 24 Fault Tolerance Apache Storm If a supervisor node dies, Nimbus will reassign the work to other nodes.
  25. 25. 25 Fault Tolerance Apache Storm If Nimbus dies, topologies will continue to function normally! but won’t be able to perform reassignments.
  26. 26. 26 Fault Tolerance Apache Storm In contrast to Hadoop, where if the JobTracker dies, all the running jobs are lost.
  27. 27. 27 Fault Tolerance Apache Storm Preferably run ZK with nodes >= 3 so that you can tolerate the failure of 1 ZK server.
  28. 28. 28 A Sample Word Count Topology Sentence Spout: Split Sentence Bolt: Word Count Bolt: Report Bolt: prints the contents Apache Storm { "sentence": "my dog has fleas" } { "word" : "my" } { "word" : "dog" } { "word" : "has" } { "word" : "fleas" } { "word" : "dog", "count" : 5 } Sentence Spout Split Sentence Bolt Word Count Bolt Report Bolt
  29. 29. 29 A Sample Word Count Code Apache Storm public class SentenceSpout extends BaseRichSpout { private SpoutOutputCollector collector; private String[] sentences = { "my dog has fleas", "i like cold beverages", "the dog ate my homework", "don't have a cow man", "i don't think i like fleas“ }; private int index = 0; public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("sentence")); } public void open(Map config, TopologyContext context, SpoutOutputCollector collector) { this.collector = collector; } public void nextTuple() { this.collector.emit(new Values(sentences[index])); index++; if (index >= sentences.length) { index = 0; } } }
  30. 30. 30 A Sample Word Count Code Apache Storm public class SplitSentenceBolt extends BaseRichBolt{ private OutputCollector collector; public void prepare(Map config, TopologyContext context, OutputCollector collector) { this.collector = collector; } public void execute(Tuple tuple) { String sentence = tuple.getStringByField("sentence"); String[] words = sentence.split(" "); for(String word : words){ this.collector.emit(new Values(word)); } } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } }
  31. 31. 31 A Sample Word Count Code Apache Storm public class WordCountBolt extends BaseRichBolt{ private OutputCollector collector; private HashMap<String, Long> counts = null; public void prepare(Map config, TopologyContext context, OutputCollector collector) { this.collector = collector; this.counts = new HashMap<String, Long>(); } public void execute(Tuple tuple) { String word = tuple.getStringByField("word"); Long count = this.counts.get(word); if(count == null){ count = 0L; } count++; this.counts.put(word, count); this.collector.emit(new Values(word, count)); } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word", "count")); } }
  32. 32. 32 A Sample Word Count Code Apache Storm public class ReportBolt extends BaseRichBolt { private HashMap<String, Long> counts = null; public void prepare(Map config, TopologyContext context, OutputCollector collector) { this.counts = new HashMap<String, Long>(); } public void execute(Tuple tuple) { String word = tuple.getStringByField("word"); Long count = tuple.getLongByField("count"); this.counts.put(word, count); } public void declareOutputFields(OutputFieldsDeclarer declarer) { // this bolt does not emit anything } public void cleanup() { List<String> keys = new ArrayList<String>(); keys.addAll(this.counts.keySet()); Collections.sort(keys); for (String key : keys) { System.out.println(key + " : " + this.counts.get(key)); } } }

×