Today’s products - devices, software and services - are well instrumented to permit users, vendors and service providers to gather maximum insight into how they are used, when they need repair and many other operational insights. Ensuring that products can rapidly adapt to a constantly changing environment and changing customer needs requires that the events they generate are analyzed continuously and in context. Insights can be synthesized from many sources in context - geospatial and proximity, trajectory and even predicted future states.Customers, vendors and service providers need to analyze, learn, and predict directly from streaming events because data volumes are huge and automated responses must often be delivered in milliseconds. To achieve insights quickly, we need to build models on-the-fly whose predictions are accurate and in sync with the real world, often to support automation. Many insights depend on analyzing the joint evolution of data sources whose behavior is correlated in time or space.In this talk we present Swim, an Apache 2.0 licensed platform for continuous intelligence applications. Swim builds a fluid model of data sources and their changing relationships in real-time - Swim applications analyze, learn and predict directly from event data. Swim applications integrate with Apache Kafka for event streaming. Developers need nothing more than Java skills. Swim deploys native or in containers on k8s, with the same code in each instance. Instances link to build an application layer mesh that facilitates distribution and massive scale without sacrificing consistency. We will present several continuous intelligence applications in use today that depend on real-time analysis, learning and prediction to power automation and deliver responses that are in sync with the real-world. We will show how easy it is to build, deploy and run distributed, highly available event streaming applications that analyze data from hundreds of millions of sources - petabytes per day. The architecture is intuitively appealing and blazingly fast.
2. Swim is an Apache 2.0 licensed platform that makes it
easy to build applications that deliver continuous
intelligence from streaming events, at massive scale
swimos.org
3. Swim
• Auto-build & scale apps
directly from streaming data
• Apps are a million times faster
• They need 90% less infra
“A bit of Dev and no Ops”
4. Swim is a Stateful, Real-time Stream Processor
• Builds and scales apps directly from event data, creating a graph of stateful,
concurrent actors that continuously compute – driven by data
• Automates distributed application infrastructure operation
• Load balances, secures, persists and auto-scales applications
➢ Infrastructure: Distributed, p2p mesh of
instances on k8s using WebSockets
Fabric
➢ App: Distributed, stateful, graph of concurrent
actors, streaming APIs & real-time browser UIs
8. A Real-Time Customer Service Challenge
“WTF”?
“That’s impossible…”
“Customer-centric, not service centric”
9. Challenges
Huge broker clusters (>100 nodes)
Even bigger app clusters (>400 nodes)
Slow (~10 hours)
Vast amounts of data (5 PB/day)
10. Using SwimOS
Use 90% less infrastructure (40 nodes vs 400)
Apps are easy & auto-scale with no Ops
Do data science on live data
Answers a million times faster (10ms vs 10h)
11. Mesh of SwimOS
Instances
Fabric
SwimOS
Actors that
continuously
compute & stream
Events Insights
Distributed Actor
Runtime
Coherent Fabric
Continuous Analysis,
Learning & Prediction
Application
Pipeline
12. 12
How Swim Works
Build a simple Java app & deploy
instances using Kubernetes
... continuously streaming insights to
web UIs, storage, applications
Swim uses streaming data to build a
graph of stateful, concurrent Web Agents
– one per data source – all in-memory
Web Agents dynamically link to
related Agents to share state
like smart “digital twins” of things They continuously analyze, learn &
predict from their state and the states
of linked Agents – driven by data
They react in real-time, and stream
their state changes over their links
13. Swim: for Things
• Swim creates a stateful, concurrent actor - a Web Agent -
for each data source in streaming data - that continuously
analyzes data from its real-world “twin”
• Each Web Agent (dynamically) links to related agents,
creating a fluid in-memory graph that tracks complex
relationships
• Containment, proximity, “neighbor” … “is approaching”
• Computed: “correlated to” … “predicted to be within”
• Linked Agents use each others’ states to continuously
analyze, learn and predict…
• They can instantly react, and stream their state to apps, real-
time UIs, data lakes…
16. Web Agent
(actor)
Developer Model
Web Agents are concurrent, distributed Java actors
– ActorID is a URI – hence Web Agent
– Use WARP on HTTP/2 + CRDTs to ensure coherence in
1/2RTT
• Lanes are object members
– Receive data
– Code and state eg: “average”
• Links are relationships between web agents
– Express relationships
• “Schema” derived eg: contains
• Computed: maps, joins, correlations, membership,
“predicted to be”, geospatial
– Build a (distributed) graph
– A link to a lane lets the linker observe lane state
Data Lanes Links
f(data, old_state) → new_state
code &
state
17. Streaming
Inputs
(i1, i2,.. in, ) → (o1,…oj, )
St-1 St
Streaming
Outputs
Continuously Current Materialized Views
• A Web Agent can analyze millions of (concurrent)
updates to Agents it is linked to (in DB: column
analysis across rows – eg for analytics)
• Does not immediately trigger recomputation
• A changed input invalidates dependent outputs
• De-bounced to allow bursts of state changes
• Use timers to enforce latency bounds
millions
of links
18. • An application self-assembles as a DAG - on the fly - from events
• Each vertex is a stateful, concurrent actor – a Web Agent
• They receive a continuous stream of events and stream their
state changes as CRDTs, in-sync with the real-world
• They analyze, learn, and predict as events flow
Swim Applications Are DAGs
23. Analyze, learn & predict on-the-fly, driven by events
Actors respond instantly – in sync with the real-world
Continuously evaluate complex parametric functions and
causal relationships eg: “near to” or “predicted to be within”
Build apps directly from streaming data by linking streaming
actors into a fluid graph of real-world relationships
Continuous Intelligence for Apache Kafka
24. A little Dev but no Ops
Do data science on live data
Continuous Intelligence for Apache Kafka
Use 90% less infrastructure
Deliver answers a million times faster…