2. What is Stream processing
Stream processing is a technical paradigm to process big volume of
unbound sequence of tuples in realtime
= stream
Source Stream Processor
• Continuous analytics
• Online machine learning
• Sensor data monitoring
• Financial trading …
4. What is Storm?
Storm is
• Fast & scalable
• Fault-tolerant
• Guarantees messages will be processed
• Easy to setup & operate
• Free & open source
distributed realtime computation system
- Originally developed by Nathan Marz at
BackType (acquired by Twitter)
- Written in Java and Clojure
15. Stream grouping
• Shuffle grouping: pick a random task
• Fields grouping: consistent hashing on a
subset of tuple fields
• All grouping: send to all tasks
• Global grouping: pick task with lowest id
25. Easy to setup & operate
• Setup ZooKeeper cluster
• Install dependencies on Nimbus and worker
machines
- ZeroMQ 2.1.7 and JZMQ
- Java 6 and Python 2.6.6
- unzip
• Download and extract a Storm release to Nimbus
and worker machines
• Fill in mandatory configuration into storm.yaml
• Launch daemons under supervision using “storm”
script