1. A HIGH THROUGHPUT
COMPLEX EVENT DETECTION TECHNIQUE
WITH BULK EVALUATION
Naotaka Nishimura (University of Tsukuba)
Hideyuki Kawashima (University of Tsukuba)
Hiroyuki Kitagawa (University of Tsukuba)
3. Big Data Streams
-- Volume, Velocity, Variety, Veracity, Value -3
Social network
Facebook, 600 TB in a day
(VLDB’13 Keynote)
Monitoring System
CISCO, 322 Tbps
Science
LHC, 15 PB / year
LSST, 20 TB / day
CRS-3
4. Quick Review
Data Stream Management System (DSMS)
4
How many packets are
arrived for port 80
in a minute ?
SELECT COUNT(*)
FROM eth0[TIME 1 MIN]
WHERE port = 80
Q1
DSMS
20
Relational schema
Relation
eth0
・Destination IP
・Source IP
・Destination Port
・Source Port
・Interface (e.g. eth0)
・Length
・Version (e.g. IPV4 )
・Payload
5. SELECT COUNT(*)
FROM eth0[TIME 1 MIN]
WHERE port = 80
5
DSMS
Data
Input
adapter
w
σ
α
Query
Output
adapter
Result
Users/Apps.
SQL is translated to operator tree.
On arrival of data, tree is evaluated.
Operators are based on relational database
w(Window):
Cutting off relations from a stream
σ (Selection): Filter
α (Aggregation): such as AVG, MIN, MAX
CEP (complex event processing)
6. Complex Event Processing (CEP)
Detect a certain pattern from input stream data
Stream Data
A1
A2
B3
C4
E5
D6
D7
…
Query Pattern:A→B→D
Pattern occurrences (sequences of events specified by user)
A1→B3→D6
A2→B3→D6 A1→B3→D7 A2→B3→D7
…
7. Complex Event Processing (CEP)
A case for order management in a restaurant.
Detect a guest who passed entrance and took a seat.
Pattern: Entrance→Seat
RFID
Place
Seat2
Seat3
Floor
Entrance
Seat6
Seat5
9:54:11
xx
Toilet
Seat4
Entrance
10:10:01
xx
10:10:31
yy
Floor
Seat1
TagID
Entrance
Toilet
Time
10:10:31
yy
Seat5
10:11:11
yy
A pattern occurrence is constructed by 2
9. SASE [1] Overview (1/2)
[1]:High-Performance Complex Event
Processing over Streams, ACM
SIGMOD 2006
SASE detects specified patterns using NFA(Non
deterministic Finite Automata).
NFA (quick review)
Is a finite automaton which can achieve multiple states at the
same time.
FA is an architecture that transits from current state to next state
by input symbol. It is constituted of initial state, acceptance state,
state set, input symbol, and transition function.
Ex) NFA that detects A→B→D
• Self transition;
This is a self loop transition
which is invoked by every event.
10. SASE Overview (2/2)
Problem of NFA:
NFA can detect specified patterns, but it does not produce
pattern occurrences (sequence of input events that achieved
acceptance state)
SASE
Utilizes stack structure (AIS) to output pattern occurrences.
AIS (Active Instance Stack)
For a state, an AIS is prepared
0
A
1
*
AIS
B
D
2
3
*
AIS
AIS
11. Behavior of SASE (1/3)
Translate a query pattern to an NFA
A
0
A
B
1
*
B
D
2
*
D
3
12. Behavior of SASE (2/3)
Prepare an AIS for each state of NFA
Create a link when an event is pushed
Event arrival sequence
t
a1 c2 b3 a4 d5
0
A
1
*
a1
a4
B
D
2
*
b3
3
d5
13. Behavior of SASE (3/3)
Create a pattern occurrence when acceptance state is
achieved using link information
0
A
1
B
D
2
*
3
*
a1
a4
b3
d5
a1
b3
d5
14. IDEA: If we can evaluate d7 and d9 in a lump, the cost
Problem of a1-b3 should be reduced (2 to 1).
for constructing SASE we found
Duplicate generation (e.g. b3 → a1)
b6,d7,a8,d9
0
A
1
B
0
D
2
*
3
*
a1
b3
a4
d7
b6
A
1
B
*
b3
b6
b3
d7
d7
d7
3
*
a1
b3
a4
b6
a8
Result Generation
a1
a1
a4
D
2
Result Generation
a1
a1
a4
b3
b6
b3
d9
d9
d9
d9
16. Concept: Bulk Evaluation
Generate Result
Generate Result
Generate Result
Generate Result
Generate Result
[SASE]
a1
c2
b3
a4
d5
b6
d7
a8
d9
b10
d11
d12
b13
[Proposal]
Generate Result
Generate Result
t
17. Behavior of Proposal (1/3)
Create a link when an event is pushed to AIS
Keep D events, different from SASE
a1 c2 b3 a4 d5 b6 d7 a8 d9
0
A
1
B
D
2
3
*
*
a1
b3
d5
a4
b6
d7
a8
d9
t
18. Behavior of Proposal (2/3)
Create a cluster on final AIS
0
A
1
B
D
2
*
*
a1
b3
a4
b6
a8
3
0
A
1
B
D
2
3
*
d5
*
a1
b3
d5
d7
a4
b6
d7
d9
a8
d9
19. Behavior of Proposal (3/3)
Create pattern occurrences in a bulk
Result with d9 is made with result on d7
0
A
1
B
D
2
3
*
*
a1
b3
d5
a4
b6
d7
a8
d9
a1
b3
d5
a1
a1
a4
b3
b6
b6
d7
d7
d7
a1
a1
a4
b3
b6
b6
d9
d9
d9
24. Conclusions and Future Work
Conclusions
SASE had a chance for further improvement on throughput.
Bulk evaluation scheme improved throughput.
Factor of 5.24 at the maximum case
Future work
Implementing the proposal to Falcon
26. 27
- UDP-RX
- Window join (64-cores)
Performance
Monitor
Falcon
Basic: 6.7 millions of tuples per second
Proposal: 14.6 millions of tuples per second