For organizations with boundless data sources, it is important to analyze, learn, predict and even respond in real time – directly from streaming data. This is important when:
•Data volumes are large, or moving raw data is expensive,
•Data is generated by widely distributed assets (eg: mobile devices),
•Data is of ephemeral value and analysis can’t wait, or
•It is critical to always have the latest insight and extrapolation won’t do.
Use cases include prediction of failures on assembly lines, prediction of traffic flows in cities, predicting demand placed in power grids, detection of hackers, and understanding connection quality in mobile networks. They are characterized by a need to know – now – and require real-time processing of streaming data. Our goal is to enable real-time stream processing for Apache Pulsar in which analysis, learning and prediction are done on-the-fly, with continuous insights streamed back to the broker.