Storm is a real-time distributed computation tool and provides distribute RPC service.
In this slide, we'll learn how to exploit Storm to build an online realtime prediction by storm DRPC.
Storm DRPC provides the benefits of load balance and real-time response service.
4. Storm DRPC
• https://github.com/nathanmarz/storm/wiki/Distributed-RPC
• DRPC daemon receives requests and distributes
those requests to user-defined Bolt/Topology.
• We follow the examples in
https://github.com/nathanmarz/storm-starter to build a
Python DRPC service.
• The benefits provided by Storm:
– Load balance and resource allocation
– Real-time service
– Fault tolerance
Storm http://storm-project.net 4
5. Example: Build a real-time SVM
prediction service with Storm DRPC
• Goal: We have a trained SVM model, and plan
to provide a real-time prediction service.
– Steps:
1. Train the SVM model.
2. Build the Storm DRPC topology with Python Bolt.
3. Deploy the topology to storm.
4. Build the Storm DRPC Client.
5. Prediction on the fly.
• Code repository: storm_demo directory in
https://bitbucket.org/noahsark/slideshare
Storm http://storm-project.net 5
6. Step 1. Train the SVM model.
• Note: the following codes are in storm_demo dir.
$ ./train_model.py
• We use the 20 newsgroup data from sklearn to
build a SVM classification model.
• The output model is a pickle file (svm_model.pkl)
in storm-starter/multilang/resources/
Storm http://storm-project.net 6
7. Step 2. Build the Storm DRPC topology
with Python Bolt.
• storm-starter dir comes from
. It contains
https://github.com/nathanmarz/storm-starter
lots topology example, we’ll build our DRPC
topology in storm-starter/src/jvm/storm/jimmy:
SVMDRPCTopology.java
• We build a DRPC Topology by
LinearDRPCTopologyBuilder and write a Bolt by
extends ShellBolt implements IRichBolt. After
that we can write the Bolt in Python.
• Note: the number 3 and 6 in program are
adjustable parameters related to parallelism and
number of worker.
Storm http://storm-project.net 7
8. public class SVMDRPCTopology {
public static class SVMBolt extends ShellBolt implements IRichBolt {
public SVMBolt() {
super("python", "svm_bolt.py");
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("id", "result"));
}
@Override
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
public static void main(String[] args) throws Exception {
LinearDRPCTopologyBuilder builder = new LinearDRPCTopologyBuilder("svm");
builder.addBolt(new SVMBolt(), 3);
Config conf = new Config();
conf.setNumWorkers(6);
StormSubmitter.submitTopology("svm", conf, builder.createRemoteTopology());
}
}
Storm http://storm-project.net 8
9. Step 2. Build the Storm DRPC topology
with Python Bolt.
• We write svm_bolt.py in storm-
starter/multilang/resources
• Note that all files in this dir will be packed into
a jar file, so the svm model file is also put in
this dir.
• Bolt in Python:
– Extend storm.BasicBolt
– Implement initialize() and process()
– Dump exception message to file for debug.
Storm http://storm-project.net 9
10. class SVMBolt(storm.BasicBolt):
def initialize(self, stormconf, context):
svm_bolt.py
'''initialize your members here.'''
try:
self.model = pkl.load(open('svm_model.pkl', 'rb'))
except:
traceback.print_exc(file=open('/tmp/trace_svm_bolt.txt', 'a'))
def process(self, tup):
'''We serialize the input and output by json for convenience.'''
try:
data = array(json.loads(tup.values[1]))
result = self.model.predict(data)
storm.emit([tup.values[0], json.dumps(result.tolist())])
except:
traceback.print_exc(file=open('/tmp/trace_svm_bolt.txt', 'a'))
if __name__ == '__main__':
try:
SVMBolt().run()
except:
traceback.print_exc(file=open('/tmp/trace_svm_bolt.txt', 'a'))
Storm http://storm-project.net 10
11. Step 3. Deploy the topology to storm.
• Commands:
/storm-starter $ mvn -f m2-pom.xml package
– This will generate jar files in target dir.
/storm-starter $ storm jar target/storm-starter-
0.0.1-SNAPSHOT-jar-with-dependencies.jar
storm.jimmy.SVMDRPCTopology
– Submit topology
$ storm list
– Check whether the topology is running
Storm http://storm-project.net 11
12. Step 4. Build the Storm DRPC Client.
• We’ll exploit the Python API generated by
Thrift to connect to DRPC server. The required
files are in storm dir, comes from
https://github.com/nathanmarz/storm
• For the background knowledge of Thrift, refer
to http://thrift.apache.org/tutorial/
• The client: (predict_model.py)
1. Construct connection
2. Call Service by execute(‘svm’, data_to_predict)
Storm http://storm-project.net 12