SlideShare a Scribd company logo
1 of 36
███████╗██╗ ██╗ ██╗██╗ ██╗
██╔════╝██║ ██║ ██║╚██╗██╔╝
█████╗ ██║ ██║ ██║ ╚███╔╝
██╔══╝ ██║ ██║ ██║ ██╔██╗
██║ ███████╗╚██████╔╝██╔╝
██╗
╚═╝ ╚══════╝ ╚═════╝ ╚═╝ ╚═╝
Apache Storm
Frictionless Topology Configuration & Deployment
P. Taylor Goetz, Hortonworks
@ptgoetz
Storm BoF - Hadoop Summit Brussels 2015
About me…
• VP - Apache Storm
• ASF Member
• Member of Technical Staff, Hortonworks
What is Flux?
• An easier way to configure and deploy Apache Storm topologies
• A YAML DSL for defining and configuring Storm topologies
• And more…
Why Flux?
Because seeing duplication of
effort makes me sad…
What’s wrong here?
public static void main(String[] args) throws Exception {
String name = "myTopology";
StormTopology topology = getTopology();
// logic to determine if we're running locally or not…
boolean runLocal = shouldRunLocal();
// create necessary config options…
Config conf = new Config();
if(runLocal){
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, conf, topology);
} else {
StormSubmitter.submitTopology(name, conf, topology);
}
}
What’s wrong here?
public static void main(String[] args) throws Exception {
String name = "myTopology";
StormTopology topology = getTopology();
// logic to determine if we're running locally or not…
boolean runLocal = shouldRunLocal();
// create necessary config options…
Config conf = new Config();
if(runLocal){
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, conf, topology);
} else {
StormSubmitter.submitTopology(name, conf, topology);
}
}
• Configuration tightly coupled with code.
• Changes require recompilation & repackaging.
Wouldn’t this be easier?
storm jar mycomponents.jar org.apache.storm.flux.Flux --local config.yaml
OR…
storm jar mycomponents.jar org.apache.storm.flux.Flux --remote config.yaml
Flux allows you to package all your Storm components once.
Then wire, configure, and deploy topologies using a YAML
definition.
Flux Features
• Easily configure and deploy Storm topologies (Both Storm core and Microbatch
API) without embedding configuration in your topology code
• Support for existing topology code
• Define Storm Core API (Spouts/Bolts) using a flexible YAML DSL
• YAML DSL support for virtually any Storm component (storm-kafka, storm-hdfs,
storm-hbase, etc.)
• Convenient support for multi-lang components
• External property substitution/filtering for easily switching between
configurations/environments (similar to Maven-style ${variable.name}
substitution)
Flux YAML DSL
YAML Definition Consists of:
• Topology Name (1)
• Includes (0…*)
• Config Map (0…1)
• Components (0…*)
• Spouts (1…*)
• Bolts (1…*)
• Streams (1…*)
Flux YAML DSL
Config
A Map-of-Maps (Objects) that will be passed to the topology at
submission time (Storm config).
# topology name
name: “myTopology"
# topology configuration
config:
topology.workers: 5
topology.max.spout.pending: 1000
# ...
Components
• Catalog (list/map) of Objects that can be used/referenced in other
parts of the YAML configuration
• Roughly analogous to Spring beans.
Components
Simple Java class with default constructor:
# Components
components:
- id: "stringScheme"
className: "storm.kafka.StringScheme"
Components: Constructor Arguments
Component classes can be instantiate with “constructorArgs” (a list of
class constructor arguments):
# Components
components:
- id: "zkHosts"
className: "storm.kafka.ZkHosts"
constructorArgs:
- "localhost:2181"
Components: References
Components can be “referenced” throughout the YAML config and
used as arguments:
# Components
components:
- id: "stringScheme"
className: "storm.kafka.StringScheme"
- id: "stringMultiScheme"
className: "backtype.storm.spout.SchemeAsMultiScheme"
constructorArgs:
- ref: "stringScheme"
Components: Properties
Components can be configured using JavaBean setter methods and
public instance variables:
- id: "spoutConfig"
className: "storm.kafka.SpoutConfig"
properties:
- name: "forceFromStart"
value: true
- name: "scheme"
ref: "stringMultiScheme"
Components: Config Methods
Call arbitrary methods to configure a component:
- id: "recordFormat"
className:
"org.apache.storm.hdfs.bolt.format.DelimitedRecordFormat"
configMethods:
- name: "withFieldDelimiter"
args: ["|"]
References can be used here as well.
Spouts
A list of objects that implement the IRichSpout interface and an
associated parallelism setting.
# spout definitions
spouts:
- id: "sentence-spout"
className: "org.apache.storm.flux.wrappers.spouts.FluxShellSpout"
# shell spout constructor takes 2 arguments: String[], String[]
constructorArgs:
# command line
- ["node", "randomsentence.js"]
# output fields
- ["word"]
parallelism: 1
# ...
Bolts
A list of objects that implement the IRichBolt or IBasicBolt interface with
an associated parallelism setting.
# bolt definitions
bolts:
- id: "splitsentence"
className: "org.apache.storm.flux.wrappers.bolts.FluxShellBolt"
constructorArgs:
# command line
- ["python", "splitsentence.py"]
# output fields
- ["word"]
parallelism: 1
# ...
- id: "count"
className: "backtype.storm.testing.TestWordCounter"
parallelism: 1
# ...
Spout and Bolt definitions are just extensions of
“Component” with a “parallelism” attribute, so all
component features (references, constructor
args, properties, config methods) can be used.
Streams
• Represent Spout-to-Bolt and Bolt-to-Bolt connections
• In graph terms: “edges”
• Also define Stream Groupings:
• ALL, CUSTOM, DIRECT, SHUFFLE, LOCAL_OR_SHUFFLE,
FIELDS, GLOBAL, or NONE.
Streams
Custom stream grouping:
- from: "bolt-1"
to: "bolt-2"
grouping:
type: CUSTOM
customClass:
className: "backtype.storm.testing.NGrouping"
constructorArgs:
- 1
Again, you can use references, properties, and config methods.
Filtering/Variable Substitution
Define properties in an external properties file, and reference them in
YAML using ${} syntax:
- id: "rotationAction"
className:
"org.apache.storm.hdfs.common.rotation.MoveFileAction"
configMethods:
- name: "toDestination"
args: ["${hdfs.dest.dir}"]
Will get replaced with value of property prior to YAML parsing.
Filtering/Variable Substitution
Environment variables can be referenced in YAML using ${ENV-}
syntax:
- id: "rotationAction"
className:
"org.apache.storm.hdfs.common.rotation.MoveFileAction"
configMethods:
- name: "toDestination"
args: [“${ENV-HDFS_DIR}”]
Will get replaced with value of $HDFS_DIR env variable prior to YAML parsing.
File Includes and Overrides
Include files/classpath resources and optionally override values:
name: "include-topology"
includes:
- resource: true
file: "/configs/shell_test.yaml"
override: false #otherwise subsequent includes that define 'name'
would override
Existing Topologies
&
Trident Topologies
Existing Topologies
• Alternative to YAML Spout/Bolt/Stream DSL
• Same syntax
• Works with transactional/micro-batch (Trident) topologies
• Tell Flux about the class that will produce your topology
• Components, references, constructor args, properties, config
methods, etc. can all be used
Existing Topologies
Provide a class with a public method that returns a StormTopology
instance:
/**
* Marker interface for objects that can produce `StormTopology` objects.
*
* If a `topology-source` class implements the `getTopology()` method, Flux will
* call that method. Otherwise, it will introspect the given class and look for a
* similar method that produces a `StormTopology` instance.
*
* Note that it is not strictly necessary for a class to implement this interface.
* If a class defines a method with a similar signature, Flux should be able to find
* and invoke it.
*
*/
public interface TopologySource {
public StormTopology getTopology(Map<String, Object> config);
}
This can be a Spout/Bolt or Trident topology.
Existing Topologies
Define a topologySource to tell Flux how to configure the class that
creates the topology:
# configuration that uses an existing topology that does not implement
TopologySource
name: "existing-topology"
topologySource:
className: "org.apache.storm.flux.test.SimpleTopology"
methodName: "getTopologyWithDifferentMethodName"
constructorArgs:
- "foo"
- "bar"
Components, references, constructor args, properties,
config methods, etc. can all be used.
Flux Usage
• Add the Flux dependency to your project.
• Use the Maven shade plugin to create a fat jar file.
• Use the `storm` command to run (locally) or deploy (remotely) your
topology:
storm jar mycomponents.jar org.apache.storm.flux.Flux [options] <config file>
Flux Usage: Command Line Options
usage: storm jar <my_topology_uber_jar.jar> org.apache.storm.flux.Flux [options] <topology-config.yaml>
-d,--dry-run Do not run or deploy the topology. Just build, validate, and print information about the
topology.
-e,--env-filter Perform environment variable substitution. Replace keysidentified with `${ENV-[NAME]}` will be
replaced with the corresponding `NAME` environment value
-f,--filter <file> Perform property substitution. Use the specified file as a source of properties, and replace
keys identified with {$[property name]} with the value defined in the properties file.
-i,--inactive Deploy the topology, but do not activate it.
-l,--local Run the topology in local mode.
-n,--no-splash Suppress the printing of the splash screen.
-q,--no-detail Suppress the printing of topology details.
-r,--remote Deploy the topology to a remote cluster.
-R,--resource Treat the supplied path as a class path resource instead of a file.
-s,--sleep <ms> When running locally, the amount of time to sleep (in ms.) before killing the topology and
shutting down the local cluster.
-z,--zookeeper <host:port> When running in local mode, use the ZooKeeper at the specified <host>:<port> instead of the
in-process ZooKeeper. (requires Storm 0.9.3 or later)
With great power comes great
responsibility.
It’s up to you to avoid shooting yourself in the foot!
Feedback/Contributions Welcome
https://github.com/ptgoetz/fluxFlux on GitHub:
Thank you! AMA…
P. Taylor Goetz, Hortonworks
@ptgoetz

More Related Content

What's hot

Scala ActiveRecord
Scala ActiveRecordScala ActiveRecord
Scala ActiveRecord
scalaconfjp
 

What's hot (20)

Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
 
Kotlin coroutines and spring framework
Kotlin coroutines and spring frameworkKotlin coroutines and spring framework
Kotlin coroutines and spring framework
 
Spring data requery
Spring data requerySpring data requery
Spring data requery
 
Python client api
Python client apiPython client api
Python client api
 
Building node.js applications with Database Jones
Building node.js applications with Database JonesBuilding node.js applications with Database Jones
Building node.js applications with Database Jones
 
JUnit5 and TestContainers
JUnit5 and TestContainersJUnit5 and TestContainers
JUnit5 and TestContainers
 
Backbone.js: Run your Application Inside The Browser
Backbone.js: Run your Application Inside The BrowserBackbone.js: Run your Application Inside The Browser
Backbone.js: Run your Application Inside The Browser
 
Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018
 
Scala active record
Scala active recordScala active record
Scala active record
 
Zabbix LLD from a C Module by Jan-Piet Mens
Zabbix LLD from a C Module by Jan-Piet MensZabbix LLD from a C Module by Jan-Piet Mens
Zabbix LLD from a C Module by Jan-Piet Mens
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
 
Down to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap DumpsDown to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap Dumps
 
Java 7 New Features
Java 7 New FeaturesJava 7 New Features
Java 7 New Features
 
SQL Server Select Topics
SQL Server Select TopicsSQL Server Select Topics
SQL Server Select Topics
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207
 
XQuery Extensions
XQuery ExtensionsXQuery Extensions
XQuery Extensions
 
Developing for Node.JS with MySQL and NoSQL
Developing for Node.JS with MySQL and NoSQLDeveloping for Node.JS with MySQL and NoSQL
Developing for Node.JS with MySQL and NoSQL
 
Polyglot Persistence
Polyglot PersistencePolyglot Persistence
Polyglot Persistence
 
XQuery Design Patterns
XQuery Design PatternsXQuery Design Patterns
XQuery Design Patterns
 
Scala ActiveRecord
Scala ActiveRecordScala ActiveRecord
Scala ActiveRecord
 

Similar to Flux: Apache Storm Frictionless Topology Configuration & Deployment

Testing NodeJS with Mocha, Should, Sinon, and JSCoverage
Testing NodeJS with Mocha, Should, Sinon, and JSCoverageTesting NodeJS with Mocha, Should, Sinon, and JSCoverage
Testing NodeJS with Mocha, Should, Sinon, and JSCoverage
mlilley
 
JavaScript Growing Up
JavaScript Growing UpJavaScript Growing Up
JavaScript Growing Up
David Padbury
 

Similar to Flux: Apache Storm Frictionless Topology Configuration & Deployment (20)

Terraform at Scale - All Day DevOps 2017
Terraform at Scale - All Day DevOps 2017Terraform at Scale - All Day DevOps 2017
Terraform at Scale - All Day DevOps 2017
 
Lobos Introduction
Lobos IntroductionLobos Introduction
Lobos Introduction
 
Comprehensive Terraform Training
Comprehensive Terraform TrainingComprehensive Terraform Training
Comprehensive Terraform Training
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Infrastructure as code deployed using Stacker
Infrastructure as code deployed using StackerInfrastructure as code deployed using Stacker
Infrastructure as code deployed using Stacker
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache Mesos
 
Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Getting started with Clojure
Getting started with ClojureGetting started with Clojure
Getting started with Clojure
 
Riak add presentation
Riak add presentationRiak add presentation
Riak add presentation
 
Jstl Guide
Jstl GuideJstl Guide
Jstl Guide
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Terraform infraestructura como código
Terraform infraestructura como códigoTerraform infraestructura como código
Terraform infraestructura como código
 
Testing NodeJS with Mocha, Should, Sinon, and JSCoverage
Testing NodeJS with Mocha, Should, Sinon, and JSCoverageTesting NodeJS with Mocha, Should, Sinon, and JSCoverage
Testing NodeJS with Mocha, Should, Sinon, and JSCoverage
 
DZone Java 8 Block Buster: Query Databases Using Streams
DZone Java 8 Block Buster: Query Databases Using StreamsDZone Java 8 Block Buster: Query Databases Using Streams
DZone Java 8 Block Buster: Query Databases Using Streams
 
Storm
StormStorm
Storm
 
JavaScript Growing Up
JavaScript Growing UpJavaScript Growing Up
JavaScript Growing Up
 
Solr As A SparkSQL DataSource
Solr As A SparkSQL DataSourceSolr As A SparkSQL DataSource
Solr As A SparkSQL DataSource
 

More from P. Taylor Goetz (8)

From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...
 
Past, Present, and Future of Apache Storm
Past, Present, and Future of Apache StormPast, Present, and Future of Apache Storm
Past, Present, and Future of Apache Storm
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Flux: Apache Storm Frictionless Topology Configuration & Deployment

  • 1. ███████╗██╗ ██╗ ██╗██╗ ██╗ ██╔════╝██║ ██║ ██║╚██╗██╔╝ █████╗ ██║ ██║ ██║ ╚███╔╝ ██╔══╝ ██║ ██║ ██║ ██╔██╗ ██║ ███████╗╚██████╔╝██╔╝ ██╗ ╚═╝ ╚══════╝ ╚═════╝ ╚═╝ ╚═╝ Apache Storm Frictionless Topology Configuration & Deployment P. Taylor Goetz, Hortonworks @ptgoetz Storm BoF - Hadoop Summit Brussels 2015
  • 2. About me… • VP - Apache Storm • ASF Member • Member of Technical Staff, Hortonworks
  • 3. What is Flux? • An easier way to configure and deploy Apache Storm topologies • A YAML DSL for defining and configuring Storm topologies • And more…
  • 5. Because seeing duplication of effort makes me sad…
  • 6. What’s wrong here? public static void main(String[] args) throws Exception { String name = "myTopology"; StormTopology topology = getTopology(); // logic to determine if we're running locally or not… boolean runLocal = shouldRunLocal(); // create necessary config options… Config conf = new Config(); if(runLocal){ LocalCluster cluster = new LocalCluster(); cluster.submitTopology(name, conf, topology); } else { StormSubmitter.submitTopology(name, conf, topology); } }
  • 7. What’s wrong here? public static void main(String[] args) throws Exception { String name = "myTopology"; StormTopology topology = getTopology(); // logic to determine if we're running locally or not… boolean runLocal = shouldRunLocal(); // create necessary config options… Config conf = new Config(); if(runLocal){ LocalCluster cluster = new LocalCluster(); cluster.submitTopology(name, conf, topology); } else { StormSubmitter.submitTopology(name, conf, topology); } } • Configuration tightly coupled with code. • Changes require recompilation & repackaging.
  • 8. Wouldn’t this be easier? storm jar mycomponents.jar org.apache.storm.flux.Flux --local config.yaml OR… storm jar mycomponents.jar org.apache.storm.flux.Flux --remote config.yaml
  • 9. Flux allows you to package all your Storm components once. Then wire, configure, and deploy topologies using a YAML definition.
  • 10. Flux Features • Easily configure and deploy Storm topologies (Both Storm core and Microbatch API) without embedding configuration in your topology code • Support for existing topology code • Define Storm Core API (Spouts/Bolts) using a flexible YAML DSL • YAML DSL support for virtually any Storm component (storm-kafka, storm-hdfs, storm-hbase, etc.) • Convenient support for multi-lang components • External property substitution/filtering for easily switching between configurations/environments (similar to Maven-style ${variable.name} substitution)
  • 11. Flux YAML DSL YAML Definition Consists of: • Topology Name (1) • Includes (0…*) • Config Map (0…1) • Components (0…*) • Spouts (1…*) • Bolts (1…*) • Streams (1…*)
  • 13. Config A Map-of-Maps (Objects) that will be passed to the topology at submission time (Storm config). # topology name name: “myTopology" # topology configuration config: topology.workers: 5 topology.max.spout.pending: 1000 # ...
  • 14. Components • Catalog (list/map) of Objects that can be used/referenced in other parts of the YAML configuration • Roughly analogous to Spring beans.
  • 15. Components Simple Java class with default constructor: # Components components: - id: "stringScheme" className: "storm.kafka.StringScheme"
  • 16. Components: Constructor Arguments Component classes can be instantiate with “constructorArgs” (a list of class constructor arguments): # Components components: - id: "zkHosts" className: "storm.kafka.ZkHosts" constructorArgs: - "localhost:2181"
  • 17. Components: References Components can be “referenced” throughout the YAML config and used as arguments: # Components components: - id: "stringScheme" className: "storm.kafka.StringScheme" - id: "stringMultiScheme" className: "backtype.storm.spout.SchemeAsMultiScheme" constructorArgs: - ref: "stringScheme"
  • 18. Components: Properties Components can be configured using JavaBean setter methods and public instance variables: - id: "spoutConfig" className: "storm.kafka.SpoutConfig" properties: - name: "forceFromStart" value: true - name: "scheme" ref: "stringMultiScheme"
  • 19. Components: Config Methods Call arbitrary methods to configure a component: - id: "recordFormat" className: "org.apache.storm.hdfs.bolt.format.DelimitedRecordFormat" configMethods: - name: "withFieldDelimiter" args: ["|"] References can be used here as well.
  • 20. Spouts A list of objects that implement the IRichSpout interface and an associated parallelism setting. # spout definitions spouts: - id: "sentence-spout" className: "org.apache.storm.flux.wrappers.spouts.FluxShellSpout" # shell spout constructor takes 2 arguments: String[], String[] constructorArgs: # command line - ["node", "randomsentence.js"] # output fields - ["word"] parallelism: 1 # ...
  • 21. Bolts A list of objects that implement the IRichBolt or IBasicBolt interface with an associated parallelism setting. # bolt definitions bolts: - id: "splitsentence" className: "org.apache.storm.flux.wrappers.bolts.FluxShellBolt" constructorArgs: # command line - ["python", "splitsentence.py"] # output fields - ["word"] parallelism: 1 # ... - id: "count" className: "backtype.storm.testing.TestWordCounter" parallelism: 1 # ...
  • 22. Spout and Bolt definitions are just extensions of “Component” with a “parallelism” attribute, so all component features (references, constructor args, properties, config methods) can be used.
  • 23. Streams • Represent Spout-to-Bolt and Bolt-to-Bolt connections • In graph terms: “edges” • Also define Stream Groupings: • ALL, CUSTOM, DIRECT, SHUFFLE, LOCAL_OR_SHUFFLE, FIELDS, GLOBAL, or NONE.
  • 24. Streams Custom stream grouping: - from: "bolt-1" to: "bolt-2" grouping: type: CUSTOM customClass: className: "backtype.storm.testing.NGrouping" constructorArgs: - 1 Again, you can use references, properties, and config methods.
  • 25. Filtering/Variable Substitution Define properties in an external properties file, and reference them in YAML using ${} syntax: - id: "rotationAction" className: "org.apache.storm.hdfs.common.rotation.MoveFileAction" configMethods: - name: "toDestination" args: ["${hdfs.dest.dir}"] Will get replaced with value of property prior to YAML parsing.
  • 26. Filtering/Variable Substitution Environment variables can be referenced in YAML using ${ENV-} syntax: - id: "rotationAction" className: "org.apache.storm.hdfs.common.rotation.MoveFileAction" configMethods: - name: "toDestination" args: [“${ENV-HDFS_DIR}”] Will get replaced with value of $HDFS_DIR env variable prior to YAML parsing.
  • 27. File Includes and Overrides Include files/classpath resources and optionally override values: name: "include-topology" includes: - resource: true file: "/configs/shell_test.yaml" override: false #otherwise subsequent includes that define 'name' would override
  • 29. Existing Topologies • Alternative to YAML Spout/Bolt/Stream DSL • Same syntax • Works with transactional/micro-batch (Trident) topologies • Tell Flux about the class that will produce your topology • Components, references, constructor args, properties, config methods, etc. can all be used
  • 30. Existing Topologies Provide a class with a public method that returns a StormTopology instance: /** * Marker interface for objects that can produce `StormTopology` objects. * * If a `topology-source` class implements the `getTopology()` method, Flux will * call that method. Otherwise, it will introspect the given class and look for a * similar method that produces a `StormTopology` instance. * * Note that it is not strictly necessary for a class to implement this interface. * If a class defines a method with a similar signature, Flux should be able to find * and invoke it. * */ public interface TopologySource { public StormTopology getTopology(Map<String, Object> config); } This can be a Spout/Bolt or Trident topology.
  • 31. Existing Topologies Define a topologySource to tell Flux how to configure the class that creates the topology: # configuration that uses an existing topology that does not implement TopologySource name: "existing-topology" topologySource: className: "org.apache.storm.flux.test.SimpleTopology" methodName: "getTopologyWithDifferentMethodName" constructorArgs: - "foo" - "bar" Components, references, constructor args, properties, config methods, etc. can all be used.
  • 32. Flux Usage • Add the Flux dependency to your project. • Use the Maven shade plugin to create a fat jar file. • Use the `storm` command to run (locally) or deploy (remotely) your topology: storm jar mycomponents.jar org.apache.storm.flux.Flux [options] <config file>
  • 33. Flux Usage: Command Line Options usage: storm jar <my_topology_uber_jar.jar> org.apache.storm.flux.Flux [options] <topology-config.yaml> -d,--dry-run Do not run or deploy the topology. Just build, validate, and print information about the topology. -e,--env-filter Perform environment variable substitution. Replace keysidentified with `${ENV-[NAME]}` will be replaced with the corresponding `NAME` environment value -f,--filter <file> Perform property substitution. Use the specified file as a source of properties, and replace keys identified with {$[property name]} with the value defined in the properties file. -i,--inactive Deploy the topology, but do not activate it. -l,--local Run the topology in local mode. -n,--no-splash Suppress the printing of the splash screen. -q,--no-detail Suppress the printing of topology details. -r,--remote Deploy the topology to a remote cluster. -R,--resource Treat the supplied path as a class path resource instead of a file. -s,--sleep <ms> When running locally, the amount of time to sleep (in ms.) before killing the topology and shutting down the local cluster. -z,--zookeeper <host:port> When running in local mode, use the ZooKeeper at the specified <host>:<port> instead of the in-process ZooKeeper. (requires Storm 0.9.3 or later)
  • 34. With great power comes great responsibility. It’s up to you to avoid shooting yourself in the foot!
  • 36. Thank you! AMA… P. Taylor Goetz, Hortonworks @ptgoetz