SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Downloaden Sie, um offline zu lesen
Page1
Developing Java Streaming Applications
with Apache Storm
Lester Martin www.ajug.org - Nov 2017
Page2
Connection before Content
Lester Martin – Hadoop/Spark/Storm Trainer & Consultant
lester.martin@gmail.com
http://lester.website (links to blog, twitter,
github, LI, FB, etc)
Page3
Agenda – Needs Updating!!!!
• What is Storm?
• Conceptual Model
• Compile Time
• DEMO: Develop Word Count Topology
• Runtime
• DEMO: Submit Word Count Topology
• Additional Features
• DEMO: Kafka > Storm > HBase Topology in Local Cluster
Page4
What is Storm?
Page5
Storm is …
à Streaming
– Key enabler of the Lambda Architecture
à Fast
– Clocked at 1M+ messages per second per node
à Scalable
– Thousands of workers per cluster
à Fault Tolerant
– Failure is expected, and embraced
à Reliable
– Guaranteed message delivery
– Exactly-once semantics
Page6
Storm in the Lambda Architecture
persists data
Hadoop
batch processing
batch feeds
Update event models
Pattern templates, key-
performance indicators, and
alerts
Dashboards and Applications
Stormreal-time data
feeds
Page7
Conceptual Model
Page8
TUPLE
{…}
Page9
Tuple
à Unit of work to be processes
à Immutable ordered set of serializable values
à Fields must have assigned name
{…}
Page10
Stream
à Core abstraction of Storm
à Unbounded sequence of Tuples
{…} {…} {…} {…} {…} {…} {…}
Page11
SPOUT
Page12
Spout
à Source of Streams
à Wrap an event source and emit Tuples
Page13
Message Queues
Message queues are often the source of the data processed by Storm
Storm Spouts integrate with many types of message queues
real-time data
source
operating
systems,
services and
applications,
sensors
Kestrel,
RabbitMQ,
AMQP, Kafka,
JMS, others…
message
queue
log entries,
events, errors,
status
messages, etc.
Storm
data from queue
is read by Storm
Page14
BOLT
Page15
Bolt
à Core unit of computation
à Receive Tuples and do stuff
à Optionally, emit additional Tuples
Page16
Bolt
à Write to a data store
Page17
Bolt
à Read from a data store
Page18
Bolt
à Perform arbitrary computation
Page19
Bolt
à (Optionally) Emit additional Stream(s)
Page20
TOPOLOGY
Page21
Topology
à DAG of Spouts and Bolts
à Data Flow Representation
à Streaming Computation
Page22
Topology
à Storm executes Spouts and Bolts as Tasks that run in parallel on
multiple machines
Page23
Parallel Execution of Topology Components
a logical
topology
spout A
bolt A bolt B
bolt C
a physical
implementation
machine A
machine B
machine E
machine C
machine D
machine F
machine G
spout A
two tasks
bolt A
two tasks
bolt B two
tasks
bolt C
one task
Page24
Stream Groupings
Stream Groupings determine how Storm routes Tuples between Tasks
Grouping Type Routing Behavior
Shuffle Randomized round-robin (evenly distribute
load to downstream Bolts)
Fields Ensures all Tuples with the same Field
value(s) are always routed to the same Task
All Replicates Stream across all the Bolt’s
Tasks (use with care)
Other options Including custom RYO grouping logic
Page25
Compile Time
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields(”sentence"));
}
Page26
Example Spout Code (1 of 2)
public class RandomSentenceSpout extends BaseRichSpout {
SpoutOutputCollector _collector;
Random _rand;
@Override
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
_collector = collector;
_rand = new Random();
}
@Override
public void nextTuple() {
Utils.sleep(100);
String[] sentences = new String[]{ "the cow jumped over the moon", "an apple a day keeps
the doctor away", "four score and seven years ago", "snow white and the seven dwarfs",
"i am at two with nature" };
String sentence = sentences[_rand.nextInt(sentences.length)];
_collector.emit(new Values(sentence));
}
Continued next page…
Storm uses open to open the spout and provide it with its configuration,
a context object providing information about components in the
topology, and an output collector used to emit tuples.
Storm uses nextTuple to request
the spout emit the next tuple.
The spout uses emit to send a
tuple to one or more bolts.
Name of the spout class. Storm spout class used as a “template”.
Page27
Example Spout Code (2 of 2)
@Override
public void ack(Object id) {
}
@Override
public void fail(Object id) {
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields(”sentence"));
}
}
Storm calls the spout’s ack method to signal that
a tuple has been fully processed.
Storm calls the spout’s fail method to signal
that a tuple has not been fully processed.
The declareOutputFields
method names the fields in a tuple.
Continued…
Page28
Example Bolt Code
public static class ExclamationBolt extends BaseRichBolt {
OutputCollector _collector;
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
_collector = collector;
}
public void execute(Tuple tuple) {
_collector.emit(tuple, new Values(tuple.getString(0) + "!!!"));
_collector.ack(tuple);
}
public void cleanup(); {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
}
The prepare method
provides the bolt with
its configuration and
an
OutputCollector
used to emit tuples.
The execute method
receives a tuple from a
stream and emits a
new tuple. It also
provides an ack
method that can be
used after successful
delivery.
The cleanup method
releases system
resources when bolt is
shut down.
Names the fields in the output
tuples. More detail later.
Name of the bolt class. Bolt class used as a “template.”
Page29
Example Topology Code
public static main(String[] args) throws exception {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout(“words”, new TestWordSpout());
builder.setBolt(“exclaim1”, new NewExclamationBolt()).shuffleGrouping(“words”);
builder.setBolt(“exclaim2”, new NewExclamationBolt()).shuffleGrouping(“exclaim1”);
Config conf = new Config();
StormSubmitter.submitTopology(”add-exclamation", conf, builder.createTopology());
}
This code…
words exclaim1 exclaim2
shuffleGrouping shuffleGrouping
…builds this
Topology.
runs code in
TestWordSpout()
runs code in
NewExclamationBolt()
runs code in
NewExclamationBolt()
Page30
DEMO
Develop Word Count Topology
Page31
Runtime
Nimbus
Supervisor
Supervisor
Supervisor
Supervisor
Page32
Physical View
Page33
Topology Submitter uploads topology:
• topology.jar
• topology.ser
• conf.ser
Topology Deployment
Page34
Topology Deployment
Nimbus calculates assignments and sends to Zookeeper
Page35
Topology Deployment
Supervisor nodes receive assignment information
via Zookeeper watches
Page36
Topology Deployment
Supervisor nodes download topology from Nimbus:
• topology.jar
• topology.ser
• conf.ser
Page37
Topology Deployment
Supervisors spawn workers (JVM processes)
Page38
DEMO
Submit Topology to Storm Topology
Page39
Additional Features
FAIL
Page40
Local Versus Distributed Storm Clusters
The topology program code submitted to Storm using storm jar is
different when submitting to local mode versus a distributed cluster.
The submitTopology method is used in both cases.
• The difference is the class that contains the submitTopology method.
Config conf = new Config();
LocalCluster cluster = new LocalCluster();
LocalCluster.submitTopology("mytopology", conf, topology);
Config conf = new Config();
StormSubmitter.submitTopology("mytopology", conf, topology);
Instantiate a local
cluster object.
Submit a topology
to a local cluster.
Submit a topology to a
distributed cluster.
Same method
name, different
classes
Same method
name, different
classes.
Page41
Reliable Processing
Bolts may emit Tuples Anchored to one received.
Tuple “B” is a descendant of Tuple “A”
Page42
Reliable Processing
Multiple Anchorings form a Tuple tree
(bolts not shown)
Page43
Reliable Processing
Bolts can Acknowledge that a tuple
has been processed successfully.
ACK
Page44
Reliable Processing
Bolts can also Fail a tuple to trigger a spout to
replay the original.
FAIL
Page45
Reliable Processing
Any failure in the Tuple tree will trigger a
replay of the original tuple
Page46
More Stuff
à Topology description/deployment options
– Flux
– Storm SQL
à Polyglot development
à Micro-batching with Trident
à Fault tolerance & deployment isolation
à Integrations
– Messaging; Kafka, Redis, Kestrel, Kinesis, MQTT, JMS
– Databases; HBase, Hive, Druid, Cassandra, MongoDB, JDBC
– Search Engines; Solr, Elasticsearch
– HDFS
– And more!
Page47
DEMO
Kafka > Storm > HBase Topology in a Local Cluster
Page48
Kafka > Storm > HBase Example
Requirements:
• Land simulated server logs into Kafka
• Configure a Kafka Bolt to consume the server log messages
• Ignore all messages that are not either WARN or ERROR
• Persist WARN and ERROR messages into HBase
– Keep 10 most recent messages for each server
– Maintain a running total of these concerning messages
• Publish these messages back to Kafka
Kafka
Kafka
HBase
HBaseParse FilterKafka
Kafka
Page49
Questions?
Lester Martin – Hadoop/Spark/Storm Trainer & Consultant
lester.martin@gmail.com
http://lester.website (links to blog, twitter, github, LI, FB, etc)
THANKS FOR YOUR TIME!!

Weitere ähnliche Inhalte

Was ist angesagt?

Chapter 14 replication
Chapter 14 replicationChapter 14 replication
Chapter 14 replication
AbDul ThaYyal
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
nathanmarz
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
Haris456
 

Was ist angesagt? (20)

Chapter 14 replication
Chapter 14 replicationChapter 14 replication
Chapter 14 replication
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
 
Research Scope in Parallel Computing And Parallel Programming
Research Scope in Parallel Computing And Parallel ProgrammingResearch Scope in Parallel Computing And Parallel Programming
Research Scope in Parallel Computing And Parallel Programming
 
RPC communication,thread and processes
RPC communication,thread and processesRPC communication,thread and processes
RPC communication,thread and processes
 
Let's Talk Locks!
Let's Talk Locks!Let's Talk Locks!
Let's Talk Locks!
 
page replacement.pptx
page replacement.pptxpage replacement.pptx
page replacement.pptx
 
Google File System
Google File SystemGoogle File System
Google File System
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Operating system 25 classical problems of synchronization
Operating system 25 classical problems of synchronizationOperating system 25 classical problems of synchronization
Operating system 25 classical problems of synchronization
 
Global state recording in Distributed Systems
Global state recording in Distributed SystemsGlobal state recording in Distributed Systems
Global state recording in Distributed Systems
 
Interprocess communication (IPC) IN O.S
Interprocess communication (IPC) IN O.SInterprocess communication (IPC) IN O.S
Interprocess communication (IPC) IN O.S
 
Multiprocessor Scheduling
Multiprocessor SchedulingMultiprocessor Scheduling
Multiprocessor Scheduling
 
Operating Systems - "Chapter 5 Process Synchronization"
Operating Systems - "Chapter 5 Process Synchronization"Operating Systems - "Chapter 5 Process Synchronization"
Operating Systems - "Chapter 5 Process Synchronization"
 
Network-Connected Development with ZeroMQ
Network-Connected Development with ZeroMQNetwork-Connected Development with ZeroMQ
Network-Connected Development with ZeroMQ
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Unit testing of spark applications
Unit testing of spark applicationsUnit testing of spark applications
Unit testing of spark applications
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 

Ähnlich wie Developing Java Streaming Applications with Apache Storm

Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
oscon2007
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
oscon2007
 

Ähnlich wie Developing Java Streaming Applications with Apache Storm (20)

Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Distributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache StormDistributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache Storm
 
Storm is coming
Storm is comingStorm is coming
Storm is coming
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
 
Storm
StormStorm
Storm
 
Introduction to Apache Storm
Introduction to Apache StormIntroduction to Apache Storm
Introduction to Apache Storm
 
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormC*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
 
Storm 0.8.2
Storm 0.8.2Storm 0.8.2
Storm 0.8.2
 
Java design patterns
Java design patternsJava design patterns
Java design patterns
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptx
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Real time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormReal time and reliable processing with Apache Storm
Real time and reliable processing with Apache Storm
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Continuations in scala (incomplete version)
Continuations in scala (incomplete version)Continuations in scala (incomplete version)
Continuations in scala (incomplete version)
 
The GO Language : From Beginners to Gophers
The GO Language : From Beginners to GophersThe GO Language : From Beginners to Gophers
The GO Language : From Beginners to Gophers
 
Storm
StormStorm
Storm
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
 

Kürzlich hochgeladen

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 

Kürzlich hochgeladen (20)

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 

Developing Java Streaming Applications with Apache Storm

  • 1. Page1 Developing Java Streaming Applications with Apache Storm Lester Martin www.ajug.org - Nov 2017
  • 2. Page2 Connection before Content Lester Martin – Hadoop/Spark/Storm Trainer & Consultant lester.martin@gmail.com http://lester.website (links to blog, twitter, github, LI, FB, etc)
  • 3. Page3 Agenda – Needs Updating!!!! • What is Storm? • Conceptual Model • Compile Time • DEMO: Develop Word Count Topology • Runtime • DEMO: Submit Word Count Topology • Additional Features • DEMO: Kafka > Storm > HBase Topology in Local Cluster
  • 5. Page5 Storm is … à Streaming – Key enabler of the Lambda Architecture à Fast – Clocked at 1M+ messages per second per node à Scalable – Thousands of workers per cluster à Fault Tolerant – Failure is expected, and embraced à Reliable – Guaranteed message delivery – Exactly-once semantics
  • 6. Page6 Storm in the Lambda Architecture persists data Hadoop batch processing batch feeds Update event models Pattern templates, key- performance indicators, and alerts Dashboards and Applications Stormreal-time data feeds
  • 9. Page9 Tuple à Unit of work to be processes à Immutable ordered set of serializable values à Fields must have assigned name {…}
  • 10. Page10 Stream à Core abstraction of Storm à Unbounded sequence of Tuples {…} {…} {…} {…} {…} {…} {…}
  • 12. Page12 Spout à Source of Streams à Wrap an event source and emit Tuples
  • 13. Page13 Message Queues Message queues are often the source of the data processed by Storm Storm Spouts integrate with many types of message queues real-time data source operating systems, services and applications, sensors Kestrel, RabbitMQ, AMQP, Kafka, JMS, others… message queue log entries, events, errors, status messages, etc. Storm data from queue is read by Storm
  • 15. Page15 Bolt à Core unit of computation à Receive Tuples and do stuff à Optionally, emit additional Tuples
  • 16. Page16 Bolt à Write to a data store
  • 17. Page17 Bolt à Read from a data store
  • 19. Page19 Bolt à (Optionally) Emit additional Stream(s)
  • 21. Page21 Topology à DAG of Spouts and Bolts à Data Flow Representation à Streaming Computation
  • 22. Page22 Topology à Storm executes Spouts and Bolts as Tasks that run in parallel on multiple machines
  • 23. Page23 Parallel Execution of Topology Components a logical topology spout A bolt A bolt B bolt C a physical implementation machine A machine B machine E machine C machine D machine F machine G spout A two tasks bolt A two tasks bolt B two tasks bolt C one task
  • 24. Page24 Stream Groupings Stream Groupings determine how Storm routes Tuples between Tasks Grouping Type Routing Behavior Shuffle Randomized round-robin (evenly distribute load to downstream Bolts) Fields Ensures all Tuples with the same Field value(s) are always routed to the same Task All Replicates Stream across all the Bolt’s Tasks (use with care) Other options Including custom RYO grouping logic
  • 25. Page25 Compile Time @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields(”sentence")); }
  • 26. Page26 Example Spout Code (1 of 2) public class RandomSentenceSpout extends BaseRichSpout { SpoutOutputCollector _collector; Random _rand; @Override public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { _collector = collector; _rand = new Random(); } @Override public void nextTuple() { Utils.sleep(100); String[] sentences = new String[]{ "the cow jumped over the moon", "an apple a day keeps the doctor away", "four score and seven years ago", "snow white and the seven dwarfs", "i am at two with nature" }; String sentence = sentences[_rand.nextInt(sentences.length)]; _collector.emit(new Values(sentence)); } Continued next page… Storm uses open to open the spout and provide it with its configuration, a context object providing information about components in the topology, and an output collector used to emit tuples. Storm uses nextTuple to request the spout emit the next tuple. The spout uses emit to send a tuple to one or more bolts. Name of the spout class. Storm spout class used as a “template”.
  • 27. Page27 Example Spout Code (2 of 2) @Override public void ack(Object id) { } @Override public void fail(Object id) { } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields(”sentence")); } } Storm calls the spout’s ack method to signal that a tuple has been fully processed. Storm calls the spout’s fail method to signal that a tuple has not been fully processed. The declareOutputFields method names the fields in a tuple. Continued…
  • 28. Page28 Example Bolt Code public static class ExclamationBolt extends BaseRichBolt { OutputCollector _collector; public void prepare(Map conf, TopologyContext context, OutputCollector collector) { _collector = collector; } public void execute(Tuple tuple) { _collector.emit(tuple, new Values(tuple.getString(0) + "!!!")); _collector.ack(tuple); } public void cleanup(); { } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } } The prepare method provides the bolt with its configuration and an OutputCollector used to emit tuples. The execute method receives a tuple from a stream and emits a new tuple. It also provides an ack method that can be used after successful delivery. The cleanup method releases system resources when bolt is shut down. Names the fields in the output tuples. More detail later. Name of the bolt class. Bolt class used as a “template.”
  • 29. Page29 Example Topology Code public static main(String[] args) throws exception { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout(“words”, new TestWordSpout()); builder.setBolt(“exclaim1”, new NewExclamationBolt()).shuffleGrouping(“words”); builder.setBolt(“exclaim2”, new NewExclamationBolt()).shuffleGrouping(“exclaim1”); Config conf = new Config(); StormSubmitter.submitTopology(”add-exclamation", conf, builder.createTopology()); } This code… words exclaim1 exclaim2 shuffleGrouping shuffleGrouping …builds this Topology. runs code in TestWordSpout() runs code in NewExclamationBolt() runs code in NewExclamationBolt()
  • 33. Page33 Topology Submitter uploads topology: • topology.jar • topology.ser • conf.ser Topology Deployment
  • 34. Page34 Topology Deployment Nimbus calculates assignments and sends to Zookeeper
  • 35. Page35 Topology Deployment Supervisor nodes receive assignment information via Zookeeper watches
  • 36. Page36 Topology Deployment Supervisor nodes download topology from Nimbus: • topology.jar • topology.ser • conf.ser
  • 40. Page40 Local Versus Distributed Storm Clusters The topology program code submitted to Storm using storm jar is different when submitting to local mode versus a distributed cluster. The submitTopology method is used in both cases. • The difference is the class that contains the submitTopology method. Config conf = new Config(); LocalCluster cluster = new LocalCluster(); LocalCluster.submitTopology("mytopology", conf, topology); Config conf = new Config(); StormSubmitter.submitTopology("mytopology", conf, topology); Instantiate a local cluster object. Submit a topology to a local cluster. Submit a topology to a distributed cluster. Same method name, different classes Same method name, different classes.
  • 41. Page41 Reliable Processing Bolts may emit Tuples Anchored to one received. Tuple “B” is a descendant of Tuple “A”
  • 42. Page42 Reliable Processing Multiple Anchorings form a Tuple tree (bolts not shown)
  • 43. Page43 Reliable Processing Bolts can Acknowledge that a tuple has been processed successfully. ACK
  • 44. Page44 Reliable Processing Bolts can also Fail a tuple to trigger a spout to replay the original. FAIL
  • 45. Page45 Reliable Processing Any failure in the Tuple tree will trigger a replay of the original tuple
  • 46. Page46 More Stuff à Topology description/deployment options – Flux – Storm SQL à Polyglot development à Micro-batching with Trident à Fault tolerance & deployment isolation à Integrations – Messaging; Kafka, Redis, Kestrel, Kinesis, MQTT, JMS – Databases; HBase, Hive, Druid, Cassandra, MongoDB, JDBC – Search Engines; Solr, Elasticsearch – HDFS – And more!
  • 47. Page47 DEMO Kafka > Storm > HBase Topology in a Local Cluster
  • 48. Page48 Kafka > Storm > HBase Example Requirements: • Land simulated server logs into Kafka • Configure a Kafka Bolt to consume the server log messages • Ignore all messages that are not either WARN or ERROR • Persist WARN and ERROR messages into HBase – Keep 10 most recent messages for each server – Maintain a running total of these concerning messages • Publish these messages back to Kafka Kafka Kafka HBase HBaseParse FilterKafka Kafka
  • 49. Page49 Questions? Lester Martin – Hadoop/Spark/Storm Trainer & Consultant lester.martin@gmail.com http://lester.website (links to blog, twitter, github, LI, FB, etc) THANKS FOR YOUR TIME!!