1. Introduction to Real Time
Analytics using Apache Storm
www.edureka.in/apache-storm
Buy Complete Course at : www.edureka.in/apache-storm
Post your Questions on Twitter on @edurekaIN: #askEdureka
2. Objectives of this Session
• Un
• The need for Real Time Analytics - Usecases
• How does Storm come to rescue?
• Where does Storm fit in Hadoop Framework?
• Storm Architecture – Components of Storm
• Quiz to reinforce your learning
For Queries during the session and class recording:
Post on Twitter @edurekaIN: #askEdureka
Post on Facebook /edurekaIN
www.edureka.in/apache-storm
3. Need of Real Time Analytics
Ret
• Banking - Fraud Transaction Detection
• Telecommunication – Silent Roamers Detection
• Retail- Inventory Dynamic Pricing
• Social Networking- Trending Topics
www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
4. Growing Interest in Apache Storm
www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
5. Storm Usecases – Need for Real Time Analytics
Twitter Trends
Responsive Logs
Source: https://github.com/nathanmarz/storm/wiki/Powered-By
Custom Magazine Feeds
Real Time Video Analytics
Enable Clinicians to Make
Medical Decisions
Compare and Display
Real Time Prices
www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
6. What is Storm ?
Apache Storm is a free and open source distributed real-time computation system.
Storm makes it easy to reliably process unbounded streams of data.
Storm does for real-time processing what Hadoop did for batch processing.
Simple, can be used with any programming language.
www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
7. Understanding the Storm Architecture
Nimbus
Zookeeper
Supervisor
Zookeeper
Zookeeper
Supervisor
Supervisor
Supervisor
Supervisor
www.edureka.in/apache-storm
*Covered in module 2 in the course
Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
8. ZooKeeper
Nimbus ZooKeeper
ZooKeeper
Supervisor
Supervisor
Supervisor
Supervisor
Supervisor
Nimbus node (master node, similar to the Hadoop
JobTracker):
» Uploads computations for execution
» Distributes code across the cluster
» Launches workers across the cluster
» Monitors computation and reallocates
workers as needed
ZooKeeper nodes:
» Coordinates the Storm cluster
Supervisor nodes :
» Communicates with Nimbus through
Zookeeper, starts and stops workers
according to signals from Nimbus
Storm Components
A Storm cluster has 3 sets of nodes
1. Nimbus node
2. Zookeeper nodes
3. Supervisor nodes
www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
9. The work is delegated to different types of components that are each responsible for a simple specific processing task.
The input stream of a Storm cluster is handled by a component called a spout.
The spout passes the data to a component called a bolt, which transforms it in some way.
A bolt either persists the data in some sort of storage, or passes it to some other bolt.
Storm Topology
www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
spout
spout
bolt
bolt
bolt
bolt
passes data
passes data
transforms data
data storage
Input Data
Source
10. Why Storm is ideal for Real Time Processing
Fast – benchmarked as processing one million, 100 byte messages, per second per node.
Scalable – with parallel calculations that run across a cluster of machines.
Fault-tolerant – when workers die, Storm will automatically restart them. If a node dies, the
worker will be restarted on another node.
Reliable – Storm guarantees that each unit of data (tuple) will be processed at least once or
exactly once. Messages are only replayed when there are failures.
Easy to operate – standard configurations are suitable for production on day one. Once
deployed, Storm is easy to operate.
http://hortonworks.com/hadoop/storm/
www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
12. Upcoming Batch for Storm
Start Date:
16th Aug (08:30 PM – 11:30 PM, India Time) / 16th Aug (08:00 AM – 11:00 AM, Pacific Time)
13th Sep (7:00 AM – 10:00 AM, India Time) / 12th Sep (06:30 PM – 09:30 PM, Pacific Time)
Curriculum:
Module 1: Introduction of Big Data and Storm
Module 2: Getting Started with Storm
Module 3: Spouts and Bolts
Module 4: Trident Topologies
Module 5: Real Life Storm Project – 1
Module 6: Real Life Storm Project – 2
www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions