2. Xin Wang
• Apache Storm Committer & PMC member
• Five years distributed system exprience
• Love open source
• Focus on distributed technologies, especially stream processing
• https://github.com/vesense
14. User Behavior Analysis 2.0
3G/4G
HBase
StormKafka
HDFS/Hive
MySQL
Solr
WLAN
Cable
Topic-In-1
Topic-In-N
Topic-Out-1
Topic-Out-N
Topology-1
Topology-2
Topology-3
Topology-N
Other
15. User Behavior Analysis 3.0 - Neymar
• Unified streaming API
• Pluggable Tagging module
• Dynamic Rule/Schema update
• UDF support for Schema fields
• Built-in main Serializer/Deserializer
• 16 vcore+32GB RAM *12node
• 60 billion/day, about 700K tps
• No data lose
16. Issues&Practices
• STORM-643 KafkaUtils repeatedly fetches messages whose offset is out of range
• STORM-329 Fix cascading Storm failure by improving reconnection strategy and buffering messages
• STORM-763 Nimbus reassigned worker A to another machine, but other worker's netty client can't
connect to the new worker A
• Worker down because of some Kafka ZK related bugs KAFKA-1382,KAFKA-1387,KAFKA-1451,KAFKA-
1585
• Be careful incompatible changes when upgrading Storm from 0.9 to 1.0. Scheme changed from
`deserialize(byte[] ser)` to `deserialize(ByteBuffer ser)`
• Worker heavy GCs due to large in-memory cache
• Put the lightweight logics into the same bolt
• Do NOT set too many executors/tasks in one worker, higher parallelism sometimes causes lower
throughput
• MQ->Streaming->MQ is better than MQ->Streaming->HDFS(throughput, delay, stability)
• Never log the logs unnecessay
18. Storm-RocketMQ
• RocketMQSpout - Now only RocketMQ push mode supported, pull mode is in the plan.
The default Deserializer is StringScheme, you can override the value by setting
`RocketMQConfig.SCHEME`.
• RocketMQBolt - Async sending by default, or you can change the value by invoking
`withAsync(boolean async)`
• RocketMQState - For users using Storm Trident API
• TopicSelector - Selecting a topic based on the input Storm tuple
• TupleToMessageMapper - Mapping a Storm tuple to a RocketMQ message, you can
implement the MessageBodySerializer interface to serialize the message body. The
default implementation of MessageBodySerializer is
`body.toString().getBytes(StandardCharsets.UTF_8)`
• MessageRetryManager - Retry policy for failed messages
More details please refer to: https://github.com/apache/storm/tree/master/external/storm-rocketmq