4. Apache Storm Brief History
• 2010 - First Streaming Framework - Backtype
• 2011 – Acquired by and Deployed at Twitter
• 2013 - Open Sourced into Apache
• Present – Large Scale Production Deployments
– Yahoo 3500+ Nodes
– Alibaba 1PB of Data per Day
5. Prior Releases Highlights
• 0.9.x
• Storm becomes an Apache TLP
• First Official Apache Release
• Expanded Kafka, HDFS, HBase Integration
• 0.10.x
• Multi Tenancy
• Rolling Upgrades
• Improved Logging (Log4j2)
• JDBC, Event Hubs, Hive Integration
6. Prior Releases Highlights
• 1.0
– Pacemaker (Replaces Zookeeper for Heartbeats)
– Security (Kerberos/Digest Authentication)
– Nimbus HA (Eliminates Single Point of Failure)
– Supervisor Health Checks
– Resource Aware Scheduler
9. Apache Storm 1.1.0
March 29, 2017
• AWS Kinesis Support
• HDFS Spout
• Other Enhancements
–Flux
–Topology Deployment
–Resource Aware Scheduler
10. Streaming SQL
• Apache Calcite for Query Parsing/Planning
• Define Topology Using SQL Like Query
• SQL Compiled and Transformed onto a Trident
Topology
• Streaming Onto/From Arbitrary Data Sources
– Kafka, Redis, HDFS, MongoDB
– Extensible Implementing ISqlTridentDataSource
11. Streaming SQL
• Tuple Filtering
• Projections
• CSV, TSV, and Avro input/output formats
• User Defined Functions (UDFs)
• User fine control of Parallelism of Generated
Components
13. Streaming SQL – Example [1]
• Read Apache HTTPD server logs from Kafka
• Filter out everything but error log events
• Write the error events onto a Kafka topic
17. PMML Support (Machine Learning)
• Predictive Model Markup Language
• Describes Model Learned by ML algorithms
• PmmlPredictorBolt Computes Predicted Scores
for Live Tuples according to PMML Model
• PMML Model Uploaded or Downloaded from
Distributed Cache