SlideShare a Scribd company logo
1 of 14
Download to read offline
Scale out Python realtime service
           using Storm
    http://storm-project.net/

              Jimmy Lai
             2013/01/24
       r97922028 [at] ntu.edu.tw
Outline
• Setup a storm cluster
• Storm DRPC
• Example: Build a real-time SVM prediction
  service with Storm DRPC
  – Steps 1-5
  – Live Demo




                  Storm http://storm-project.net   2
Setup a storm cluster
•   https://github.com/nathanmarz/storm/wiki/Setting-up-a-Storm-cluster
• Configs:
     – Zookeeper
     – Storm
• Commands:
     – Zookeeper
         • bin/zkServer.sh start
     – Storm
         •   storm numbus
         •   storm supervisor
         •   storm drpc
         •   storm ui

                                Storm http://storm-project.net            3
Storm DRPC
• https://github.com/nathanmarz/storm/wiki/Distributed-RPC
• DRPC daemon receives requests and distributes
  those requests to user-defined Bolt/Topology.
• We follow the examples in
  https://github.com/nathanmarz/storm-starter to build a
  Python DRPC service.
• The benefits provided by Storm:
   – Load balance and resource allocation
   – Real-time service
   – Fault tolerance
                        Storm http://storm-project.net       4
Example: Build a real-time SVM
  prediction service with Storm DRPC
• Goal: We have a trained SVM model, and plan
  to provide a real-time prediction service.
  – Steps:
     1.   Train the SVM model.
     2.   Build the Storm DRPC topology with Python Bolt.
     3.   Deploy the topology to storm.
     4.   Build the Storm DRPC Client.
     5.   Prediction on the fly.
• Code repository: storm_demo directory in
  https://bitbucket.org/noahsark/slideshare
                       Storm http://storm-project.net       5
Step 1. Train the SVM model.
• Note: the following codes are in storm_demo dir.

$ ./train_model.py

• We use the 20 newsgroup data from sklearn to
  build a SVM classification model.
• The output model is a pickle file (svm_model.pkl)
  in storm-starter/multilang/resources/

                     Storm http://storm-project.net   6
Step 2. Build the Storm DRPC topology
           with Python Bolt.
• storm-starter dir comes from
                                     . It contains
  https://github.com/nathanmarz/storm-starter
  lots topology example, we’ll build our DRPC
  topology in storm-starter/src/jvm/storm/jimmy:
  SVMDRPCTopology.java
• We build a DRPC Topology by
  LinearDRPCTopologyBuilder and write a Bolt by
  extends ShellBolt implements IRichBolt. After
  that we can write the Bolt in Python.
• Note: the number 3 and 6 in program are
  adjustable parameters related to parallelism and
  number of worker.
                        Storm http://storm-project.net   7
public class SVMDRPCTopology {
  public static class SVMBolt extends ShellBolt implements IRichBolt {
    public SVMBolt() {
       super("python", "svm_bolt.py");
    }
    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
       declarer.declare(new Fields("id", "result"));
    }
    @Override
    public Map<String, Object> getComponentConfiguration() {
       return null;
    }
  }
  public static void main(String[] args) throws Exception {
    LinearDRPCTopologyBuilder builder = new LinearDRPCTopologyBuilder("svm");
    builder.addBolt(new SVMBolt(), 3);
    Config conf = new Config();
    conf.setNumWorkers(6);
    StormSubmitter.submitTopology("svm", conf, builder.createRemoteTopology());
  }
}
                              Storm http://storm-project.net                      8
Step 2. Build the Storm DRPC topology
           with Python Bolt.
• We write svm_bolt.py in storm-
  starter/multilang/resources
• Note that all files in this dir will be packed into
  a jar file, so the svm model file is also put in
  this dir.
• Bolt in Python:
   – Extend storm.BasicBolt
   – Implement initialize() and process()
   – Dump exception message to file for debug.

                    Storm http://storm-project.net   9
class SVMBolt(storm.BasicBolt):
   def initialize(self, stormconf, context):
                                                            svm_bolt.py
     '''initialize your members here.'''
     try:
         self.model = pkl.load(open('svm_model.pkl', 'rb'))
     except:
         traceback.print_exc(file=open('/tmp/trace_svm_bolt.txt', 'a'))

  def process(self, tup):
    '''We serialize the input and output by json for convenience.'''
    try:
       data = array(json.loads(tup.values[1]))
       result = self.model.predict(data)
       storm.emit([tup.values[0], json.dumps(result.tolist())])
    except:
       traceback.print_exc(file=open('/tmp/trace_svm_bolt.txt', 'a'))

if __name__ == '__main__':
   try:
     SVMBolt().run()
   except:
     traceback.print_exc(file=open('/tmp/trace_svm_bolt.txt', 'a'))
                               Storm http://storm-project.net             10
Step 3. Deploy the topology to storm.
• Commands:
/storm-starter $ mvn -f m2-pom.xml package
  – This will generate jar files in target dir.
/storm-starter $ storm jar target/storm-starter-
0.0.1-SNAPSHOT-jar-with-dependencies.jar
storm.jimmy.SVMDRPCTopology
  – Submit topology
$ storm list
  – Check whether the topology is running
                      Storm http://storm-project.net   11
Step 4. Build the Storm DRPC Client.
• We’ll exploit the Python API generated by
  Thrift to connect to DRPC server. The required
  files are in storm dir, comes from
  https://github.com/nathanmarz/storm
• For the background knowledge of Thrift, refer
  to http://thrift.apache.org/tutorial/
• The client: (predict_model.py)
   1. Construct connection
   2. Call Service by execute(‘svm’, data_to_predict)

                         Storm http://storm-project.net   12
class Client(DistributedRPC.Iface):
   def __init__(self, host='localhost', port=3772, timeout=6000):
     try:                                                           predict_model.py
        socket = TSocket.TSocket(host, port)
        socket.setTimeout(timeout)
        self.conn = TTransport.TFramedTransport(socket)
        self.client =
DistributedRPC.Client(TBinaryProtocol.TBinaryProtocol(self.conn))
        self.conn.open()
     except Thrift.TException, exc:
        print exc

  def close(self):
    self.conn.close()

  def execute(self, func, args):
    try:
       return self.client.execute(func, args)
    except Thrift.TException, exc:
       print exc.message()
    except DRPCExecutionException, exc:
       print exc
                                Storm http://storm-project.net                   13
Step 5. Prediction on the fly.
$ ./predict_model.py
data prepared
                               (Live Demo)
data predicted
       precision recall f1-score support
                                                                       • We can run many
     0
     1
              1.00
              0.50
                        1.00
                        0.67
                                  1.00
                                  0.57
                                            1
                                            3
                                                                         clients and get the
     2
     3
              1.00
              1.00
                        0.50
                        0.75
                                  0.67
                                  0.86
                                            2
                                            4
                                                                         prediction results on
     4        0.50      0.50      0.50      2                            the fly.
     5        1.00      0.50      0.67      2
     6
     7
              1.00
              0.50
                        1.00
                        1.00
                                  1.00
                                  0.67
                                            4
                                            1
                                                                       • The clients can be
     8
     9
              1.00
              0.80
                        1.00
                        1.00
                                  1.00
                                  0.89
                                            2
                                            4
                                                                         written in many
     10
     11
               1.00
               1.00
                         0.50
                         1.00
                                   0.67
                                   1.00
                                             4
                                             1
                                                                         different languages with
     13
     14
               1.00
               1.00
                         0.50
                         1.00
                                   0.67
                                   1.00
                                             2
                                             1
                                                                         Thrift.
     15        1.00      1.00      1.00      2
     16        0.33      0.33      0.33      3
     17        0.33      1.00      0.50      1
     18        0.00      0.00      0.00      1
     19        0.00      0.00      0.00      0
                                                      Storm http://storm-project.net            14
avg / total      0.81      0.72      0.74        40

More Related Content

Viewers also liked

Apache thrift-RPC service cross languages
Apache thrift-RPC service cross languagesApache thrift-RPC service cross languages
Apache thrift-RPC service cross languagesJimmy Lai
 
Build a Searchable Knowledge Base
Build a Searchable Knowledge BaseBuild a Searchable Knowledge Base
Build a Searchable Knowledge BaseJimmy Lai
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Jimmy Lai
 
Nltk natural language toolkit overview and application @ PyHug
Nltk  natural language toolkit overview and application @ PyHugNltk  natural language toolkit overview and application @ PyHug
Nltk natural language toolkit overview and application @ PyHugJimmy Lai
 
NetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHugNetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHugJimmy Lai
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learnJimmy Lai
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Jimmy Lai
 

Viewers also liked (7)

Apache thrift-RPC service cross languages
Apache thrift-RPC service cross languagesApache thrift-RPC service cross languages
Apache thrift-RPC service cross languages
 
Build a Searchable Knowledge Base
Build a Searchable Knowledge BaseBuild a Searchable Knowledge Base
Build a Searchable Knowledge Base
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013
 
Nltk natural language toolkit overview and application @ PyHug
Nltk  natural language toolkit overview and application @ PyHugNltk  natural language toolkit overview and application @ PyHug
Nltk natural language toolkit overview and application @ PyHug
 
NetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHugNetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHug
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learn
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
 

More from Jimmy Lai

Python Linters at Scale.pdf
Python Linters at Scale.pdfPython Linters at Scale.pdf
Python Linters at Scale.pdfJimmy Lai
 
EuroPython 2022 - Automated Refactoring Large Python Codebases
EuroPython 2022 - Automated Refactoring Large Python CodebasesEuroPython 2022 - Automated Refactoring Large Python Codebases
EuroPython 2022 - Automated Refactoring Large Python CodebasesJimmy Lai
 
Annotate types in large codebase with automated refactoring
Annotate types in large codebase with automated refactoringAnnotate types in large codebase with automated refactoring
Annotate types in large codebase with automated refactoringJimmy Lai
 
The journey of asyncio adoption in instagram
The journey of asyncio adoption in instagramThe journey of asyncio adoption in instagram
The journey of asyncio adoption in instagramJimmy Lai
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Jimmy Lai
 
Continuous Delivery: automated testing, continuous integration and continuous...
Continuous Delivery: automated testing, continuous integration and continuous...Continuous Delivery: automated testing, continuous integration and continuous...
Continuous Delivery: automated testing, continuous integration and continuous...Jimmy Lai
 

More from Jimmy Lai (6)

Python Linters at Scale.pdf
Python Linters at Scale.pdfPython Linters at Scale.pdf
Python Linters at Scale.pdf
 
EuroPython 2022 - Automated Refactoring Large Python Codebases
EuroPython 2022 - Automated Refactoring Large Python CodebasesEuroPython 2022 - Automated Refactoring Large Python Codebases
EuroPython 2022 - Automated Refactoring Large Python Codebases
 
Annotate types in large codebase with automated refactoring
Annotate types in large codebase with automated refactoringAnnotate types in large codebase with automated refactoring
Annotate types in large codebase with automated refactoring
 
The journey of asyncio adoption in instagram
The journey of asyncio adoption in instagramThe journey of asyncio adoption in instagram
The journey of asyncio adoption in instagram
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
 
Continuous Delivery: automated testing, continuous integration and continuous...
Continuous Delivery: automated testing, continuous integration and continuous...Continuous Delivery: automated testing, continuous integration and continuous...
Continuous Delivery: automated testing, continuous integration and continuous...
 

Recently uploaded

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Scale out python realtime service using storm

  • 1. Scale out Python realtime service using Storm http://storm-project.net/ Jimmy Lai 2013/01/24 r97922028 [at] ntu.edu.tw
  • 2. Outline • Setup a storm cluster • Storm DRPC • Example: Build a real-time SVM prediction service with Storm DRPC – Steps 1-5 – Live Demo Storm http://storm-project.net 2
  • 3. Setup a storm cluster • https://github.com/nathanmarz/storm/wiki/Setting-up-a-Storm-cluster • Configs: – Zookeeper – Storm • Commands: – Zookeeper • bin/zkServer.sh start – Storm • storm numbus • storm supervisor • storm drpc • storm ui Storm http://storm-project.net 3
  • 4. Storm DRPC • https://github.com/nathanmarz/storm/wiki/Distributed-RPC • DRPC daemon receives requests and distributes those requests to user-defined Bolt/Topology. • We follow the examples in https://github.com/nathanmarz/storm-starter to build a Python DRPC service. • The benefits provided by Storm: – Load balance and resource allocation – Real-time service – Fault tolerance Storm http://storm-project.net 4
  • 5. Example: Build a real-time SVM prediction service with Storm DRPC • Goal: We have a trained SVM model, and plan to provide a real-time prediction service. – Steps: 1. Train the SVM model. 2. Build the Storm DRPC topology with Python Bolt. 3. Deploy the topology to storm. 4. Build the Storm DRPC Client. 5. Prediction on the fly. • Code repository: storm_demo directory in https://bitbucket.org/noahsark/slideshare Storm http://storm-project.net 5
  • 6. Step 1. Train the SVM model. • Note: the following codes are in storm_demo dir. $ ./train_model.py • We use the 20 newsgroup data from sklearn to build a SVM classification model. • The output model is a pickle file (svm_model.pkl) in storm-starter/multilang/resources/ Storm http://storm-project.net 6
  • 7. Step 2. Build the Storm DRPC topology with Python Bolt. • storm-starter dir comes from . It contains https://github.com/nathanmarz/storm-starter lots topology example, we’ll build our DRPC topology in storm-starter/src/jvm/storm/jimmy: SVMDRPCTopology.java • We build a DRPC Topology by LinearDRPCTopologyBuilder and write a Bolt by extends ShellBolt implements IRichBolt. After that we can write the Bolt in Python. • Note: the number 3 and 6 in program are adjustable parameters related to parallelism and number of worker. Storm http://storm-project.net 7
  • 8. public class SVMDRPCTopology { public static class SVMBolt extends ShellBolt implements IRichBolt { public SVMBolt() { super("python", "svm_bolt.py"); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("id", "result")); } @Override public Map<String, Object> getComponentConfiguration() { return null; } } public static void main(String[] args) throws Exception { LinearDRPCTopologyBuilder builder = new LinearDRPCTopologyBuilder("svm"); builder.addBolt(new SVMBolt(), 3); Config conf = new Config(); conf.setNumWorkers(6); StormSubmitter.submitTopology("svm", conf, builder.createRemoteTopology()); } } Storm http://storm-project.net 8
  • 9. Step 2. Build the Storm DRPC topology with Python Bolt. • We write svm_bolt.py in storm- starter/multilang/resources • Note that all files in this dir will be packed into a jar file, so the svm model file is also put in this dir. • Bolt in Python: – Extend storm.BasicBolt – Implement initialize() and process() – Dump exception message to file for debug. Storm http://storm-project.net 9
  • 10. class SVMBolt(storm.BasicBolt): def initialize(self, stormconf, context): svm_bolt.py '''initialize your members here.''' try: self.model = pkl.load(open('svm_model.pkl', 'rb')) except: traceback.print_exc(file=open('/tmp/trace_svm_bolt.txt', 'a')) def process(self, tup): '''We serialize the input and output by json for convenience.''' try: data = array(json.loads(tup.values[1])) result = self.model.predict(data) storm.emit([tup.values[0], json.dumps(result.tolist())]) except: traceback.print_exc(file=open('/tmp/trace_svm_bolt.txt', 'a')) if __name__ == '__main__': try: SVMBolt().run() except: traceback.print_exc(file=open('/tmp/trace_svm_bolt.txt', 'a')) Storm http://storm-project.net 10
  • 11. Step 3. Deploy the topology to storm. • Commands: /storm-starter $ mvn -f m2-pom.xml package – This will generate jar files in target dir. /storm-starter $ storm jar target/storm-starter- 0.0.1-SNAPSHOT-jar-with-dependencies.jar storm.jimmy.SVMDRPCTopology – Submit topology $ storm list – Check whether the topology is running Storm http://storm-project.net 11
  • 12. Step 4. Build the Storm DRPC Client. • We’ll exploit the Python API generated by Thrift to connect to DRPC server. The required files are in storm dir, comes from https://github.com/nathanmarz/storm • For the background knowledge of Thrift, refer to http://thrift.apache.org/tutorial/ • The client: (predict_model.py) 1. Construct connection 2. Call Service by execute(‘svm’, data_to_predict) Storm http://storm-project.net 12
  • 13. class Client(DistributedRPC.Iface): def __init__(self, host='localhost', port=3772, timeout=6000): try: predict_model.py socket = TSocket.TSocket(host, port) socket.setTimeout(timeout) self.conn = TTransport.TFramedTransport(socket) self.client = DistributedRPC.Client(TBinaryProtocol.TBinaryProtocol(self.conn)) self.conn.open() except Thrift.TException, exc: print exc def close(self): self.conn.close() def execute(self, func, args): try: return self.client.execute(func, args) except Thrift.TException, exc: print exc.message() except DRPCExecutionException, exc: print exc Storm http://storm-project.net 13
  • 14. Step 5. Prediction on the fly. $ ./predict_model.py data prepared (Live Demo) data predicted precision recall f1-score support • We can run many 0 1 1.00 0.50 1.00 0.67 1.00 0.57 1 3 clients and get the 2 3 1.00 1.00 0.50 0.75 0.67 0.86 2 4 prediction results on 4 0.50 0.50 0.50 2 the fly. 5 1.00 0.50 0.67 2 6 7 1.00 0.50 1.00 1.00 1.00 0.67 4 1 • The clients can be 8 9 1.00 0.80 1.00 1.00 1.00 0.89 2 4 written in many 10 11 1.00 1.00 0.50 1.00 0.67 1.00 4 1 different languages with 13 14 1.00 1.00 0.50 1.00 0.67 1.00 2 1 Thrift. 15 1.00 1.00 1.00 2 16 0.33 0.33 0.33 3 17 0.33 1.00 0.50 1 18 0.00 0.00 0.00 1 19 0.00 0.00 0.00 0 Storm http://storm-project.net 14 avg / total 0.81 0.72 0.74 40