Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Introducing the WSO2 Complex Event Processor
1. Introducing the WSO2
Complex Event Processor
Simplifying Complexities of Data Processing
S. Suhothayan
Software Engineer,
Data Technologies Team.
2. Outline
ƒ Introduction to CEP
ƒ WSO2 CEP Server
ƒ Siddhi Runtime
ƒ HA & Scalability of WSO2 CEP
ƒ WSO2 CEP server and WSO2 BAM
ƒ Use Cases
3. Event Processing (Contd.)
ƒ Event processing is about listening to events and
detecting patterns in near real-time without storing
all events.
ƒ Three models
o Simple Event Processing
- Simple filters (e.g. Is this a gold or platinum customer?)
o Event Stream Processing
- Looking across multiple event streams and joining
multiple event stream etc.
o Complex Event Processing
- Processing multiple event streams to identify meaningful
patterns, using complex conditions & temporal windows
- E.g. There has been a more than 10% increase in overall
trading activity AND the average price of commodities
has fallen 2% in the last 4 hours
4. Complex Event Processing
ƒ We categorize events into different streams
ƒ Process with minimal storage
ƒ Use queries to evaluate the continuous event
streams (Usually SQL like query language)
ƒ Very fast results (in milliseconds range)
5. CEP Queries
ƒ Types of queries are following
o Filters and Projection
o Windows – events are processed within temporal
windows (e.g. for aggregation and joins).
Time window vs. length window.
o Ordering – identify event sequences and patterns
(e.g. for a credit card new location followed by
small and a large purchase might suggest a fraud)
o Joins – join two streams
6. Example Query
from p=PINChangeEvents#window.time(3600) join
t=TransactionEvents[amount>10000]#window.time(3600)
on p.custid==t.custid
return t.custid, t.amount;
7. Opensource CEP Runtimes
ƒ Siddhi
o Apache License, a java library, Tuple based event
model
o Supports distributed processing
o Supports multiple query models
- Based on a SQL-like language
- Filters, Windows, Joins, Ordering and others
ƒ Esper, http://esper.codehaus.org
o GPLv2 License, a Java library, Events can be XML, Map,
Object
o Supports multiple query models
- Based on a SQL-like language
- Filters, Windows, Joins, Ordering and others
ƒ Drools Fusion
o Apache License, a java library
o Support for temporal reasoning + windows
8. WSO2 CEP Server
ƒ Enterprise grade server for CEP runtimes
ƒ Provides support for several transports
(network access) and data formats
o SOAP/WS-Eventing – XML messages
o REST/JSON – JSON messages
o JMS – map messages, XML messages
o Thrift – WSO2 data bridge format
- High Performant Event Capturing & Delivery Framework
supports Java/C/C++/C# via Thrift language bindings.
ƒ Support multiple CEP runtimes
o Siddhi – WSO2, new, very fast, distributed
o Esper - well known CEP runtime
o Drools Fusion – rule based, but much slower
ƒ Easy plugin new brokers, new CEP engines
10. CEP Buckets
ƒ CEP Bucket is a
logical execution
unit
ƒ Each CEP bucket has
set of queries,
event sources and
input, output event
mappings.
ƒ It is one-one with a
CEP engine
11. Management UI
ƒ To define
buckets
ƒ Update running
queries without
resetting
current
execution
states
ƒ Manage brokers
(Data adopters)
12. Developer Studio UI
ƒ Eclipse based
tool to define
buckets
ƒ Can manage
the
configurations
through the
production
lifecycle
14. Big Picture
ƒ Users provide query/queries
ƒ Map event streams to queries
ƒ Siddhi keep the queries running and invoke
callbacks registered against one or more
queries/streams
ƒ Example Query
from cseEventStream[ symbol == ‘IBM’]#win.time(50000)
insert into IBMStockQuote symbol, avg(price) as avgPrice
16. Siddhi Queries: Filters
from <stream-name> [<conditions>]*
insert into <stream-name>
ƒ Filters the events by conditions
ƒ Conditions
o >, <, = , <=, <=, !=
o contains
o and, or, not
ƒ Example
from cseEventStream[price >= 20 and symbol==’IBM’]
insert into StockQuote symbol, volume
17. Window
from <stream-name> [<conditions>]#window.<window-name>(<parameters>)
Insert [<output-type>] into <stream-name
ƒ Types of Windows
o (Time | Length) (Sliding| Batch) windows
o Unique window, First unique (not supported in 1.0)
ƒ Type of aggregate functions
o sum, avg, max, min
ƒ Example
from cseEventStream[price >= 20]#window.lengthBatch(50)
insert expired-events into StockQuote
symbol, avg(price) as avgPrice
group by symbol
having avgPrice>50
18. Join
from <stream>#<window> [unidirectional] join <stream>#<window>
on <condition> within <time>
insert into <stream>
ƒ Join two streams based on a condition and window
ƒ Join can be in multiple forms ((left|right|full outer) |
inner) join - only inner is supported in 1.0
ƒ Unidirectional – event arriving only to the
unidirectional stream triggers the join
ƒ Example
from TickEvent[symbol==’IBM’]#win.length(2000)
join NewsEvent#win.time(500)
insert into JoinStream *
19. Pattern
from [every] <condition> Æ [every] <condition> … <condition>
within <time>
insert into StockQuote (<attribute-name>* | * )
ƒ Check condition A happen before/after condition B
ƒ Can do iterative checks via “every” keyword.
ƒ Here with “within <time>”, SIddhi emits only events
that are within that time of each other
ƒ Example
from every (a1 = purchase[price < 10] )
Æa2 = purchase [price >10000 and a1.cardNo==a2.cardNo]
within 300000
insert into potentialFraud
a2. cardNo as cardNo, a2. price as price, a2.place as place
20. Sequence
from <event-regular-expression> within <time> insert into <stream>
ƒ Regular Expressions supported
o * - Zero or more matches (reluctant).
o + - One or more matches (reluctant).
o ? - Zero or one match (reluctant).
o or – either event
ƒ Here we have to refer events returned by * , + using
square brackets to access a specific occurrence of
that event
From a1 = requestOrder[action == "buy"],
b1 = cseEventStream[price > a1.price and symbol==a1.symbol]+,
b2 = cseEventStream[price <b1.price]
insert into purchaseOrder
a1. symbol as symbol, b1[0].price as firstPrice, b2.price as orderPrice
21. Performance Results
ƒ We compared Siddhi with Esper, the widely used
opensource CEP engine
ƒ For evaluation, we did setup different queries using both
systems, push events in to the system, and measure the
time till all of them are processed.
ƒ We used Intel(R) Xeon(R) X3440 @2.53GHz , 4 cores 8M
cache 8GB RAM running Debian 2.6.32-5-amd64 Kernel
22. Performance Comparison With ESPER
Simple filter without window
from StockTick[prize >6] return symbol, prize
23. Performance Comparison With ESPER
State machine query for pattern matching
From f=FraudWarningEvent ->
p=PINChangeEvent(accountNumber=f.accountNumber)
return accountNumber;
24. Siddhi Features
ƒ Supports State Persistence
o Enabling Queries to span lifetimes much greater
than server uptime.
o By taking periodic snapshots and storing all state
information and windows to a scalable persistence
store (Apache Cassandra).
o Pluggable persistent stores.
ƒ Support Highly Available Deployment
o Using Hazelcast distributed cache as a shared
working memory.
25. HA/ Persistence
ƒ This is ability to recover
runtime state in the
case of a failure
ƒ CEP server can support
if CEP engine supports
persistence (OK with
Siddhi, Esper)
26. Scaling
ƒ CEP pipeline can be distributed,But queries like
windows, patterns, and Join are hard to distribute
ƒ WSO2 CEP with Siddhi uses distributed cache
(Hazelcast) as shared memory and selective
processing approach to achieve massive scalability in
distributed processing
27. Event Recording
ƒ Ability to record all/some of the events for
future processing
ƒ Few options
o Publish them to Cassandra cluster using WSO2 data
bridge API or BAM (can process data in Cassandra
with Hadoop using WSO2 BAM).
o Write them to distributed cache
o Custom thrift based event recorder
31. Scenario
ƒ Monitoring stock exchange for game changing
moments
ƒ Two input event streams.
o Event stream of Stock Quotes from a stock
exchange
o Event stream of word count on various company
names from twitter pages
ƒ Check whether the last traded price of the
stock has changed significantly(by 2%) within
last minute, and people are twitting about that
company (> 10) within last minute
36. Queries
from allStockQuotes[win.time(60000)]
insert into fastMovingStockQuotes
symbol,price, avg(price) as averagePrice
group by symbol
having ((price > averagePrice*1.02) or (averagePrice*0.98 > price ))
from twitterFeed[win.time(60000)]
insert into highFrequentTweets
company as company, sum(wordCount) as words
group by company
having (words > 10)
from fastMovingStockQuotes[win.time(60000)] as fastMovingStockQuotes
join highFrequentTweets[win.time(60000)] as highFrequentTweets
on fastMovingStockQuotes.symbol==highFrequentTweets.company
insert into predictedStockQuotes
fastMovingStockQuotes.symbol as company,
fastMovingStockQuotes.averagePrice as amount,
highFrequentTweets.words as words
37. Alert
ƒ As a XML
<quotedata:StockQuoteDataEvent
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:quotedata="http://ws.cdyne.com/">
<quotedata:StockSymbol>{company}</quotedata:StockSymbol>
<quotedata:LastTradeAmount>{amount}</quotedata:LastTradeAmount>
<quotedata:WordCount>{words}</quotedata:WordCount>
</quotedata:StockQuoteDataEvent>
38. Useful links
ƒ WSO2 CEP 2.0.0 Milestone 2
https://svn.wso2.org/repos/wso2/people/suho/packs/cep/wso2cep-2.0.0-
M2.zip
ƒ Distributed Processing Sample With Siddhi CEP
and ActiveMQ JMS Broker.
http://suhothayan.blogspot.com/2012/08/distributed-processing-sample-for-wso2.
html