Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Easily Build a Smart Pulsar Stream Processor_Simon Crosby

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige

Hier ansehen

1 von 30 Anzeige

Easily Build a Smart Pulsar Stream Processor_Simon Crosby

Herunterladen, um offline zu lesen

For organizations with boundless data sources, it is important to analyze, learn, predict and even respond in real time – directly from streaming data. This is important when:

•Data volumes are large, or moving raw data is expensive,
•Data is generated by widely distributed assets (eg: mobile devices),
•Data is of ephemeral value and analysis can’t wait, or
•It is critical to always have the latest insight and extrapolation won’t do.
Use cases include prediction of failures on assembly lines, prediction of traffic flows in cities, predicting demand placed in power grids, detection of hackers, and understanding connection quality in mobile networks. They are characterized by a need to know – now – and require real-time processing of streaming data. Our goal is to enable real-time stream processing for Apache Pulsar in which analysis, learning and prediction are done on-the-fly, with continuous insights streamed back to the broker.

For organizations with boundless data sources, it is important to analyze, learn, predict and even respond in real time – directly from streaming data. This is important when:

•Data volumes are large, or moving raw data is expensive,
•Data is generated by widely distributed assets (eg: mobile devices),
•Data is of ephemeral value and analysis can’t wait, or
•It is critical to always have the latest insight and extrapolation won’t do.
Use cases include prediction of failures on assembly lines, prediction of traffic flows in cities, predicting demand placed in power grids, detection of hackers, and understanding connection quality in mobile networks. They are characterized by a need to know – now – and require real-time processing of streaming data. Our goal is to enable real-time stream processing for Apache Pulsar in which analysis, learning and prediction are done on-the-fly, with continuous insights streamed back to the broker.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Easily Build a Smart Pulsar Stream Processor_Simon Crosby (20)

Anzeige

Weitere von StreamNative (20)

Aktuellste (20)

Anzeige

Easily Build a Smart Pulsar Stream Processor_Simon Crosby

  1. 1. Continuous Stream Processing with Pulsar and Swim Simon Crosby, CTO Swim swim.a i
  2. 2. SwimOS is an Apache 2.0 licensed platform that makes it easy to build applications that deliver continuous intelligence from streaming data, at scale swimos.org
  3. 3. PM25 Pollution NOX Pollution
  4. 4. • Apache Pulsar • Apache Kafka • Apache Beam • CNCF NATS • Amazon Kinesis • Google pub/sub • Azure Enterprise Data Bus • Salesforce Kafka • Confluent Cloud • … Streaming Platforms Ø SwimOS is a stream processor that delivers continuous intelligence from streaming data • Support pub/sub at scale • Buffer data between pubs & subs • Event-time ordered delivery • Events stored in arrival order • Don’t run applications
  5. 5. • Stream processors subscribe to a broker to analyze streaming event data • Their insights can be asynchronously consumed by publishing back to the broker • The broker offers a low-latency API that gives the stream processor events in real-time • Pulsar does not control execution of the stream processor Stream Processors
  6. 6. SwimOS is a Stateful, Real-time Stream Processor • Builds and auto-scales apps from real-world event data, creating a stateful graph that continuously computes – driven by data • Automates infrastructure operation • Load balances, secures, persists and auto-scales the application • Apps are easy to develop • Delivers unimaginable performance Application: Distributed, stateful, concurrent graph of Web Agents & real-time UIs Infra: Distributed, p2p mesh of instances on k8s using WebSockets
  7. 7. 66 Major Mobile Provider • > 150M devices • > 10Gb/s of streaming data from Pulsar • Continuous analysis, aggregation & reduction • Millisecond latency • Pervasively real-time UI • Distributed across AZ
  8. 8. Pulsar’s Many Pros • Event Processing – Filtering – Transformation – Counts / Windows – Alerts • Serverless is a great abstraction • SQL-style API • Storage tiering • Delivery guarantees • Multi-tenancy • Replication • Scaling Database llll
  9. 9. • How many topics do you need? Challenges… ! l l l l üüüü
  10. 10. ! l l l ! engine_temp: 290 fan_temp: 188 coolant_vol: 25 l Challenges…
  11. 11. Application Client Client Client Client Client • Databases don’t drive computation! (though in-memory is faster) • What DB architecture do you need? • Scaling / clustering / consistency … Streaming analytics (#solved !) ☞ polling ® not real-time engine_temp: 290 fan_temp: 188 coolant_vol: 25 Continuous Intelligence demands • Data driven computation • Analysis in context, everywhere, concurrently • Stateful, in-memory, distributed • Pervasively real-time computation Challenges…
  12. 12. > Palo Alto, CA
  13. 13. 60 TB/day ~600 TB/day (mostly ephemeral) ▶ There’s more data than your cloud could store
  14. 14. Intelligence is driven by (a flow of) state changes - not raw data
  15. 15. Users Want Stateful, Continuous, Contextual Analysis Streams are a sequence of state changes They never stop… (so “store-then-analyze” is silly) “Meaning” depends on granular contextual relationships Applications always have to have an answer λ λ xn-1
  16. 16. Introducing Swim Web Agents • SwimOS subscribes to event streams from real-world sources • It creates a stateful, concurrent web agent for each data source • Each web agent cleans, labels, analyzes data from its real-world twin • Agents dynamically link to related agents, creating a stateful in-memory graph • Containment, proximity… logical relationships eg: pod/cluster … • Computed relationships: correlated… • Linked web agents share their states in real-time • Web Agents are vertices in the graph • Each continually computes on its own state & state of its links, as data flows over the graph – and streams its results in real-time over its links • This is data-driven, stateful, continuous computation
  17. 17. Web Agents Continuously Compute - Driven by Data MapReduce Graph Analytics Learning & Prediction Analyze data to determine state Relational Relational Analysis Real-world Stateful Web Agent
  18. 18. • Web agents are stateful I’m green
  19. 19. • Raw data can typically be discarded I’m red
  20. 20. • Noisy / redundant updates are discarded I’m still red
  21. 21. I’m still red I’m green No push No push … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … Streaming data auto-scales the application – composed of concurrent web agents - at low cost, in real time, as data arrives
  22. 22. A Swim Application is an Active Graph
  23. 23. 1000m A graph of linked active Web Agents
  24. 24. Web agents continuously compute on their own state and the state of linked web agents enabling granular contextual analysis on-the-fly ① SwimOS creates a web agent for each source in streaming data ② Agents interlink to reflect real-world relationships ③ Powerful operators for analysis, learning & prediction continuously compute on state & stream results Web Agents Link to Form a Computational Graph
  25. 25. Eg:Un-supervised Training & Prediction Back Propagation Training D Predicted Observed
  26. 26. • A scaled application is a graph dynamically built from data • Objects are stateful and concurrent
  27. 27. SwimOS Eliminates “the Stack” = They continuously stream real- time insights to UIs & applications Web agents collaborate to analyze, learn, predict and respond on the fly Swim builds a stateful, distributed, graph of concurrent web agents that statefully represent real-world sources, from streaming data * Developer defines entities & their relationships – as Java objects
  28. 28. Pulsar and Swim: Better Together • Builds and auto-scales apps from real-world event data, creating a stateful graph that continuously computes – driven by data • Automates infrastructure operation • Load balances, secures, persists and auto-scales the application • Apps are easy to develop • Delivers unimaginable performance Application: Distributed, stateful, concurrent graph of Web Agents & real-time UIs Infra: Distributed, p2p mesh of instances on k8s using WebSockets
  29. 29. Questions ? swim.a i
  30. 30. Fabric Pulsar Broker EventsInsights Mesh of SwimOS Instances Distributed graph of web agents Compute continuously as data flows over the graph Web agent address space Clustered Stream Processor Operation

×