Fabric is a scalable real-time stream processing framework developed by Ola. It is designed for high throughput event ingestion from various sources and writing events to different targets. Fabric provides batch processing of events, scalability, reliability and makes data available for other applications in near real-time. It uses components like sources, processors and executors along with a compute framework to orchestrate event flows. Fabric is proven to reliably handle over 2.5 million events per second for applications like fraud detection at Ola.
2. What is Fabric?
Fabric is a scalable, practical and reliable real-time stream processing
framework designed for easy operability and extension.
Fabric is proven to work very well for:
● High velocity multi-destination event ingestion with guaranteed
persistence.
● Rules/Filter based real-time triggers for advertising/broadcast
● Online Fraud detection
● Real-time pattern matching
● Streaming analytics
3. The Problem
● Primary motivation
○ Streaming millions of messages per second
○ Connectivity to different source - Kafka, MySql etc
○ Write to different targets - DB, Queue, API or publish to other
high level applications
○ Near real time
● Desirable properties from the framework
○ High Throughput - support of batching of events
○ Data sanity - Avoiding datasets which makes no sense
○ Make data available for other applications to consume
○ Scalability and Data Reliability
○ Provide easy development and deployment
○ Resource effectiveness
7. Fabric Compute and Executor continued...
Fabric-executor
Responsible for :
● Launching, monitoring and managing deployed computations
● 1:1 relation between 1 instance of computation : fabric executor
process
● Fabric executor is single JVM process within a docker container
8. Fabric Terminologies
● Compute Framework
○ Realtime event processing framework
○ Core event orchestration
○ Perform user-defined operations
● EventSet
○ Collection(of configurable size) of events
○ Basic transmission unit within the computation
● Computation/Topology
○ Pipeline for data flow using fabric components created by user
○ Components can be of two types, Source and Processor
9. Fabric Terminologies continued...
● Source
○ Sources event sets into the computation
○ Manages the Qos of the events ingested into the computation
● Processor
○ Performs computation on an incoming event set
○ Emits an outgoing event set
○ Types:
■ Streaming Processor: Streaming Processor is triggered
whenever and event set is sent to the processor.
■ Scheduled Processor: Scheduled Processor is triggered
whenever a fixed period of time elapses in a periodic fashion.
10. Management And Deployment
Fabric Manager
● Dropwizard Web Service and runs inside a docker container
● Provides APIs to register components - sources and processors
● Provides APIs to perform CRUD on computations
● Management APIs to deploy, scale, get, delete computations
● Application resource exposes APIs for deployment related operations of computations.
● Deployment Env: Marathon and Mesos
Sample Resources
● Components. eg: POST /v1/components
○ Other APIs - get, search, register etc
● Computation. eg: POST /v1/computations/{tenant}
○ Other APIs - get, search, update, deactivate etc
● Application. eg: POST /v1/applications/{tenant}/{computation_name}
○ Other APIs - get, scale, suspend etc
13. Create Components using maven archetype
Maven archetype command -
mvn archetype:generate -DarchetypeGroupId=com.olacabs.fabric -DarchetypeArtifactId=fabric-
processor-archetype -DarchetypeVersion=1.0.0-SNAPSHOT -DartifactId=<artifact_id_of_your_project> -
DgroupId=<group_id_of_your_project> -DinteractiveMode=ture
Example -
mvn archetype:generate -DarchetypeGroupId=com.olacabs.fabric -DarchetypeArtifactId=fabric-
processor-archetype -DarchetypeVersion=1.0.0-SNAPSHOT -DartifactId=fabric-my-processor -
DgroupId=com.olacabs.fabric -DinteractiveMode=ture
What it does -
Creates the pom project for the processor with all the updated version of compute and other related
jars.
Creates boilerplate code, with example, for scheduled and stream processor. You can modify the
example java file as per your need.
14. Sample Fabric Source
/**
* A Sample Source Implementation which generates
* Random sentences.
*/
@Source(namespace = "global", name = "random-sentence-source", version = "0.1",
description = "Sample source", cpu = 0.1, memory = 64,requiredProperties = {}, optionalProperties = {"randomGeneratorSeed"})
public class RandomSentenceSource implements PipelineSource {
Random random;
String[] sentences = {
"A quick brown fox jumped over the lazy dog",
"Life is what happens to you when you are busy making other plans"
. . .
. . .
};
@Override
public void initialize(final String instanceName,final Properties global,final Properties local,
final ProcessingContext processingContext, final ComponentMetadata componentMetadata) throws Exception {
int seed = ComponentPropertyReader.readInteger(local, global, "randomGeneratorSeed", instanceName, componentMetadata, 42);
random = new Random(seed);
}
15. Sample Fabric Source continued...
@Override
public RawEventBundle getNewEvents() {
return RawEventBundle.builder().events(
getSentences(5).stream().map(sentence -> Event.builder()
.id(random.nextInt())
.data(sentence.toLowerCase())
.build())
.collect(Collectors.toCollection(ArrayList::new)))
.meta(Collections.emptyMap())
.partitionId(Integer.MAX_VALUE)
.transactionId(Integer.MAX_VALUE)
.build();
}
private List<String> getSentences(int n) {
List<String> listOfSentences = new ArrayList<>();
for (int i = 0; i < n; i++) {
listOfSentences.add(sentences[random.nextInt(sentences.length)]);
}
return listOfSentences;
}
}
16. Sample Fabric Processor
/**
* A sample Processor implementation which
* Gets the data (sentences) and splits based on delim.
*/
@Processor(namespace = "global", name = "splitter-processor", version = "0.1", cpu = 0.1, memory = 32,
description = "A processor that splits sentences by a given delimiter", processorType = ProcessorType.EVENT_DRIVEN,
requiredProperties = {}, optionalProperties = {"delimiter"})
public class SplitterProcessor extends StreamingProcessor {
private String delimiter;
@Override
public void initialize(final String instanceName, final Properties global, final Properties local,
final ComponentMetadata componentMetadata) throws InitializationException {
delimiter = ComponentPropertyReader.readString(local, global, "delimiter", instanceName, componentMetadata, ",");
}
17. Sample Fabric Processor continued...
@Override
protected EventSet consume(final ProcessingContext processingContext, final EventSet eventSet) throws ProcessingException {
List<Event> events = new ArrayList<>();
eventSet.getEvents().stream()
.forEach(event -> {
String sentence = (String) event.getData();
String[] words = sentence.split(delimiter);
events.add(Event.builder().data(words)id(Integer.MAX_VALUE).properties(Collections.emptyMap()).build());
});
return EventSet.eventFromEventBuilder()
.partitionId(eventSet.getPartitionId())
.events(events)
.build();
}
@Override
public void destroy() {
// do some cleanup if necessary
}
}
18. Sample Computation / Topology
A sample topology -
● Select random sentence from in memory list
● Split the sentence based on a delimiter
● Counts the word
● Prints the count on console
19. Sample Computation / Topology Spec continued...
{
"name": "word-count-print-topology",
"sources": [
{
"id": "random-sentence-source",
"meta": { // … meta for source}
},
"properties": { //.. properties for source}
],
"processors": [
{
"id": "splitter-processor",
"meta": { // … meta for processor}
"properties": { //.. properties for processor}
},
{
"id": "word-count-processor",
"meta": { // … meta for processor}
"properties": { //.. properties for processor}
},
{
"id": "console-printer-processor",
"meta": { // … meta for processor}
"properties": { //.. properties for processor}
}
],
"connections": [
{
"fromType": "SOURCE",
"from": "random-sentence-source",
"to": "splitter-processor"
},
{
"fromType": "PROCESSOR",
"from": "splitter-processor",
"to": "word-count-processor"
},
{
"fromType": "PROCESSOR",
"from": "word-count-processor",
"to": "console-print-processor"
}
],
"properties": {
// … global properties
}
}
29. Fabric At Ola Stats
Ola is currently receiving ~2.5 million events per second from its end users - driver and customer
apps as well as internally generated events. Multiple real-time use cases stem from the events
which includes:
● Fraud detection and prevention
● Just-in-time notifications
● Security alerts
● Real-time reporting
● Generating user specific offers
Fabric has been in production at Ola for 10 months now and powering these applications apart from
acting as raw event ingestion and pub-sub system.
30. Fabric At Ola Stats continued...
Key Stats -
● Event Streams Handled : 375+
● No of topologies live : 160+
● Ingestion rate : ~2.5 million per second on 10 nodes
● Node Config : C4.8x large machines
31. Fabric Summary Points
1. Developed in Java.
2. Highly scalable and guaranteed availability
3. Reliable - Framework level guarantees against message loss, support for replay, multiple
sources and complex tuple trees
4. Event batching is supported at the core level.
5. Source level event partitioning used as unit for scalability.
6. Uses capabilities provided by docker to ensure strong application
7. On the fly topology creation and deployment by dynamically assembling topologies using
components directly from artifactory
8. Inbuilt support for custom metrics and custom code level healthchecks to catch application
failures right when they happen
9. Easy development and deployment
And many more...
32. Links
Fabric is recently open sourced on github.
● Github link: https://github.com/olacabs/fabric
● Documentation: https://github.com/olacabs/fabric/blob/develop/README.md
Please Contribute…!