Pattern matching over event streams is increasingly being employed in many areas including financial services and click stream analysis. Flink, as a true stream processing engine, emerges as a natural candidate for these usecases. In this talk, we will present FlinkCEP, a library for Complex Event Processing (CEP) based on Flink. At the conceptual level, we will see the different patterns the library can support, we will present the main building blocks we implemented to support them, and we will discuss possible future additions that will further enhance the coverage of the library. At the practical level, we will show how the integration of FlinkCEP with Flink allows the former to take advantage of Flink's rich ecosystem (e.g. connectors) and its stream processing capabilities, such as support for event-time processing, exactly-once state semantics, fault-tolerance, savepoints and high throughput.
19. FlinkCEP Individual Patterns
Unique Name
Condition : which elements to accept
• Simple e.g shape == rectangle
• Iterative e.g rectangle.surface < triangle.surface
Quantifiers (or not)
• Looping/Optional oneOrMore(),times(#),optional()
19
Pattern
P2
P1
20. FlinkCEP Complex Patterns
Combine Individual Patterns
Contiguity Conditions
• how to select relevant events given an input mixing
relevant and irrelevant events
Time Constraints
• within(time) e.g. all events have to come within 24h
20
Pattern
P2
P1
29. FlinkCEP Summary
29
Quantifiers
• oneOrMore(), times(), optional()
Conditions
• Simple & Iterative
Time Constraints
• Event and Processing time
Different Contiguity Constraints
• Strict, relaxed, non-deterministic relaxed, NOT
30. Trace all shipments which:
• start at location A
• have at least 5 stops
• end at location B
• within the last 24h
30
Running Example: retailer
A
B
M1
M2
M3
M4
M5
31. Trace all shipments which:
• start at location A
• have at least 5 stops
• end at location B
• within the last 24h
31
Observation A Individual Patterns
Start
End
Mid
ev.from == A
ev[i].from
==
ev[i-1].to
ev.to == B
&&
size(“mid”) >= 5
32. 32
Observation B Quantifiers
Start/End: single event
Middle: multiple events
• .oneOrMore()
Start
End
Mid
ev.from == A
ev[i].from
==
ev[i-1].to
ev.to == B
&&
size(“mid”) >= 5
33. 33
Observation C Conditions
Start -> Simple
• properties of the event
Middle/End -> Iterative
• Depend on previous events
Start
End
Mid
ev.from == A
ev[i].from
==
ev[i-1].to
ev.to == B
&&
size(“mid”) >= 5
34. 34
Trace all shipments which:
• start at location A
• have at least 5 stops
• end at location B
• within the last 24h
Observation D Time Constraints
Start
End
Mid
ev.from == A
ev[i].from
==
ev[i-1].to
ev.to == B
&&
size(“mid”) >= 5
35. 35
We opt for relaxed continuity
Observation E Contiguity
45. 45
Stream Processing
and Apache Flink®'s
approach to it
@StephanEwen
Apache Flink PMC
CTO @ data ArtisansFLINKFORWARD IS COMING BACKTO BERLIN
SEPTEMBER11-13, 2017
BERLIN.FLINK-FORWARD.ORG -
Hello everyone and thanks for coming!
My name is Kostas Kloudas and I am here to talk to you about FlinkCEP, a library for complex event processing built atop Apache Flink.
A little bit about myself, I am a committer for Apache Flink and a software engineer for data Artisans, the original creators of Apache Flink and the providers of the dA Platform.
So without further adue, let’s start by seeing what is CEP or Complex Event Processing?
Complex Event Processing is the “art” of detecting event patterns, over continuous streams of data, often arriving out of order. To visualize it....
Imagine that you have a stream containing elements of different shapes and colours, as shown in the figure...
And you want to detect sequences of events where a triangle, follows after a rectangle of the SAME color. A CEP library, would take the input and the pattern, and it will return the matching patterns, ...
As shown in the figure.
Many interesting usecases fall into the category of complex event processing problems. To name a few, we have usecases from IoT....
We saw what is the basic idea behind CEP, now let’s see what is stream processing, and why a stream processor provides a good substrate for building a CEP library.
Stream processing, in its simplest form, stands for computations on never-ending streams of events.
Distributed stream processing, implies that the aforementioned computation is spread across many machines.
While stateful distributed stream processing, has the additional property of the result depending on the history of the stream. To do this the stream processor must be able to keep state in a fault-tolerant manner. Most of the interesting computations are stateful, in fact even a simple event counter needs to keep state. This is where Flink shines.
From the above, it is not difficult to see that stream processors are a natural fit for CEP.
This was the main motivation behind the first implementation of FlinkCEP, more than a year ago, and
this talk focuses on what the current capabilities of FlinkCEP, (slide)
... a library that takes your input stream and your desired pattern and returns you the matching event sequences.
So what does FlinkCEP offer? We will start by describing the building blocks the library offers for defining a complex pattern, before describing how to integrate it in your program.
Pattern definition: taking our previous pattern, where we wanted to find all rectangles followed by triangles, we see that (slide)
A complex pattern, is composed of individual patterns, or simply patterns, which search for a specific type of event. In our case, we have two individual patterns, one searching for rectangles and another searching for triangles.
These individual patterns are combined into a complex one by specifying the contiguity condition between them. We will come back to this later, but in a nutshell, contiguity describes how to select relevant events given an input mixing relevant and irrelevant events. In our example, we say that the triangle should strictly follow the rectangle.
Given that complex patterns are composed of individual patterns, we start by describing them first, before showing how to combine them together.
Individual Patterns must have a unique name and for each one of them we can define a condition based on which it accepts relevant events.
This condition can depend on properties of the event itself, in which case it is a SIMPLE condition, or on properties or statistics over a sunbset of previously accepted events, in whuch case it is an Iterative Condition.
In addition to the condition, a pattern can also have quantifiers. By default, when an individual pattern appears in a complex pattern, FlinkCEP expects the described type of event to appear exactly once, in order to have a match. This is a singleton pattern. In our case, we expect exactly one rectangle, followed by exactly one triangle. FlinkCEP also supports quantifiers. These are oneOrMore() for usecases where a specific type of event is expected “at-least once”, times() when we want it to appear a specified amount of times, and optional() if the event is optional.
The above are the possibilities offered when defining individual Patterns. These patterns can be combined into complex patterns (slide)
...by specifying the “contiguity conditions” between individual patterns, and, potentially a time constraint using the within() clause. The time constraint allows you to express usecases where, for example, “I want all my event to happen within 24h”.
To understand contiguity, let’s take our pattern as shown on the left-hand side, and our previous input... (slide)
Previously we only accepted event sequences where the triangle strictly followed the rectangle without any non-matching events in-between. This is the first form of supported contiguity, called STRICT CONTIGUITY. FlinkCEP supports 2 more modes, namely RELAXED and NON-DETERMINISTIC RELAXED contiguity.
To understand relaxed contiguity, let’s focus on the green highlighted sequence in the input box. We see that with strict contiguity, this sequence is rejected, because between the green rectangle and triangle there is a circle. In many use-cases, we want the non-matching events to simply be ignored, without invalidating previous partial matches. EXAMPLE user interaction
For these use-cases, FlinkCEP also supports Relaxed Continuity, where non-matching events are simply ignored. EXAMPLE user interaction
Finally, non-deterministic relaxed contiguity further relaxes contiguity by allowing non-deterministic actions on relevant events. To illustrate this, let’s focus on the new highlighted green sequence in the input box. For this, we see that only the sequence containing the rectangle and the first triangle was accepted (slide)
In some cases, we want this pair to be accepted, but also to have a match containing the rectangle and the second triangle. For these cases, we have the non-deterministic relaxed continuity. (slide)
Finally, for cases where an event should invalidate a match, FlinkCEP also supports NOT patterns. More on this in the documentation. NOT patterns allow to express usecases like SHOPLIFTING
For now we intentionally ignore the “marked as fragile condition”.
For now we intentionally ignore the “marked as fragile condition”.
For now we intentionally ignore the “marked as fragile condition”.
For now we intentionally ignore the “marked as fragile condition”.
For now we intentionally ignore the “marked as fragile condition”.
For now we intentionally ignore the “marked as fragile condition”.
For now we intentionally ignore the “marked as fragile condition”.
For now we intentionally ignore the “marked as fragile condition”.