Organizations that can make sense out of massive amounts of data produced by systems, customers, or partners will have a competitive edge. Ballerina Stream Processing provides real-time event stream processing capabilities to microservices, with intuitive SQL queries allowing users to filter, aggregate and correlate data to make sense, take decisions and act in real-time in a distributed manner.
2. The Problem
â—‹ Integration is not always about request-response.
â—‹ Highly scalable systems use Event Driven Architecture to asynchronously
communicate between multiple processing units.
â—‹ Processing events from Webhooks, CDC, Realtime ETLs, and notification
systems fall into asynchronous event driven systems.
3. What is a Stream?
An unbounded continuous flow of records (having the same format)
E.g., sensor events, triggers from Webhooks, messages from MQ
4. Why Stream Processing ?
Doing continuous processing on the data forever !
Such as :
â—‹ Monitor and detect anomalies
â—‹ Real-time ETL
â—‹ Streaming aggregations (e.g., average service response time in last 5
minutes)
â—‹ Join/correlate multiple data streams
â—‹ Detecting complex event patterns or trends
5. Stream Processing Constructs
â—‹ Projection
â—‹ Modifying the structure of the stream
â—‹ Filter
â—‹ Windows & Aggregations
â—‹ Collection of streaming events over a time or length duration
(last 5 min or last 50 events)
â—‹ Viewed in a sliding or tumbling manner
â—‹ Aggregated over window (e.g., sum, count, min, max, avg, etc)
â—‹ Joins
â—‹ Joining multiple streams
â—‹ Detecting Patterns
â—‹ Trends, non-occurrence of events
6. How to write Stream Processing Logic?
Use language libraries :
â—‹ Have different functions for each stream processor construct.
â—‹ Pros: You can use the same language for implementation.
â—‹ Cons: Quickly becomes very complex and messy.
User SQL dialog :
â—‹ Use easy-to-use SQL to script the logic
â—‹ Pros: Compact and easy to write the logic.
â—‹ Cons: Need to write UDFs, which SQL does not support.
7. Solution for Programing Streaming Efficiently
Merging SQL and native programing
1. Consuming events to Ballerina using standard language constructs
â—‹ Via HTTP, HTTP2, WebSocket, JMS and more.
2. Generate streams out of consumed data
â—‹ Map JSON/XML/text messages into a record.
3. Define SQL to manipulate and process data in real time
â—‹ If needed, use Ballerina functions within SQL
4. Generate output streams
5. Use standard language constructs to handle the output or send to an
endpoint
8. “
Having lots of sensors, among all valid sensors,
detect the sensors that have sent sensor readings
greater than 100 in total within the last minute.
A Use Case
11. Ballerina Stream Processing type Alert record {
string name; int total;
};
type SensorData record {
string name; int reading;
};
Define input and output
record types
12. Ballerina Stream Processing type Alert record {
string name; int total;
};
type SensorData record {
string name; int reading;
};
function alertQuery(
stream<SensorData> sensorDataStream,
stream<Alert> alertStream) {
}
Define input and output
record types
Function with
input/output Streams
13. Ballerina Stream Processing type Alert record {
string name; int total;
};
type SensorData record {
string name; int reading;
};
function alertQuery(
stream<SensorData> sensorDataStream,
stream<Alert> alertStream) {
forever {
}
}
Define input and output
record types
Function with
input/output Streams
Forever block
14. Ballerina Stream Processing type Alert record {
string name; int total;
};
type SensorData record {
string name; int reading;
};
function alertQuery(
stream<SensorData> sensorDataStream,
stream<Alert> alertStream) {
forever {
from sensorDataStream
where reading > 0
window time(60000)
select name, sum(reading) as total
group by name
having total > 100
}
}
Define input and output
record types
Function with
input/output Streams
Forever block
Among all valid sensors, select
ones having greater than 100 reading
in total within the last minute
15. Ballerina Stream Processing type Alert record {
string name; int total;
};
type SensorData record {
string name; int reading;
};
function alertQuery(
stream<SensorData> sensorDataStream,
stream<Alert> alertStream) {
forever {
from sensorDataStream
where reading > 0
window time(60000)
select name, sum(reading) as total
group by name
having total > 100
=> (Alert[] alerts) {
alertStream.publish(alerts);
}
}
}
Define input and output
record types
Function with
input/output Streams
Forever block
Among all valid sensors, select
ones having greater than 100 reading
in total within the last minute
Send Alert
17. Joining Two Streams Over Time
// Detect raw material input falls below 5% of the rate of production consumption
forever {
from productionInputStream window time(10000) as p
join rawMaterialStream window time(10000) as r
on r.name == p.name
select r.name, sum(r.amount) as totalRawMaterial, sum(p.amount) as totalConsumed
group by r.name
having ((totalRawMaterial - totalConsumed) * 100.0 / totalRawMaterial) < 5
=> (MaterialUsage[] materialUsages) {
materialUsageStream.publish(materialUsages);
}
}
18. Detecting Patterns Within Streams
// Detect small purchase transaction followed by a huge purchase transaction
// from the same card within a day
forever {
from every PurchaseStream where price < 20 as e1
followed by PurchaseStream where price > 200 && e1.id == id as e2
within 1 day
select e1.id as cardId, e1.price as initialPayment, e2.price as finalPayment
=> (Alert[] alerts) {
alertStream.publish(alerts);
}
}
19. Building Autonomous Services
â—‹ Process incoming messages or
locally produced events
â—‹ Process events at the receiving
node without sending to
centralised system
â—‹ Services can monitor themselves
throw inbuilt matric streams
producing events locally
â—‹ Do local optimizations and take
actions autonomously
20. Stream Processing at the Edge
â—‹ Support microservices architecture
â—‹ Summarize data at the edge.
â—‹ When possible, take localized decisions.
â—‹ Reduce the amount of data transferred
to the central node.
â—‹ Ability to run independently
â—‹ Highly scalable
21. The Roadmap
○ Support stream processing to incorporate Ballerina’s custom functions.
â—‹ Building Ballerina Stream Processing using Ballerina.
â—‹ Support streams joining with tables.
â—‹ Improve query language.
â—‹ Support State Recovery.
â—‹ Support High Availability.
24. Ballerina Stream Processing type Alert record {
string name; int total;
};
type SensorData record {
string name; int reading;
};
function alertQuery(
stream<SensorData> sensorDataStream,
stream<Alert> alertStream) {
forever {
from sensorDataStream
where reading > 0
window time(60000)
select name, sum(reading) as total
group by name
having total > 100
=> (Alert[] alerts) {
alertStream.publish(alerts);
}
}
}
Define input and output
record types
Function with
input/output Streams
Forever block
Among all valid sensors, select
ones having greater than 100 reading
in total within the last minute
Send Alert