6. 3 Research Topics
• Falcon
– In-DSMS analytics system
– Multiple query optimization for CPD
• Window Operators
– Join operator over data streams
• Crypt stream
– Privacy preserving stream data processing
6
7. A Multiple Query Optimization Scheme
for Change Point Detection on Falcon
Joint work with Masahiro OKE
Presented at BIRTE’13
8. SELECT COUNT(*)
FROM eth0[TIME 1 MIN]
WHERE port = 80
DSMS
Relation
eth0
・Destination IP
・Source IP
・Destination Port
・Source Port
・Interface (e.g. eth0)
・Length
・Version (e.g. IPV4 )
・Payload
Relational schema
20
Quick Review
Data Stream Management System (DSMS)
Q1
8
How many packets are
arrived for port 80
in a minute ?
9. • SQL is translated to operator tree.
• On arrival of data, tree is evaluated.
• Operators are based on relational database
– w(Window): Cutting off relations from a stream
– σ (Selection): Filter
– α (Aggregation): such as AVG, MIN, MAX
Query
Result
Users/Apps.
w σ αInput
adapter
Output
adapter
DSMS
Data
SELECT COUNT(*)
FROM eth0[TIME 1 MIN]
WHERE port = 80
9
10. A Target Application: Malware Detection
• Real datasets
– “Anti Malware engineering WorkShop 2013 (MWS
2013)”
– Extracted by NEGI proposed by Dr. Shinichi Isida.
• NICTER
– Keeps about 160,000 unused ip addresses (DARK NET)
• Packets to dark net are considered as attacks.
– Uses CPD (Change Point Detection) [1]) to detect
attacks such as DoS (denial of services).
[1] Daisuke Inoue, K. Yoshioka, M. Eto, Masaya Yamagata, Eisuke Nishino, Jun-ichi Takeuchi,
Kazuya Ohkouchi, Koji Nakao: An Incident Analysis System NICTER and Its Analysis Engines Based
on Data Mining Techniques. ICONIP (1) 2008: 579-586
[2] J. Takeuchi and K. Yamanishi, “A Unifying Framework for Detecting Outliers and Change
Points from Time Series,” IEEE TKDE, pp.482-492, 2006.
10
12. Example Query on Falcon (1/2)
• #Access for each port ? [1]
• Group by aggregates
SELECT dst_port,
COUNT(dst_port)
FROM pkt[1 sec]
GROUP BY dst_port
g-pkt
src_ip
dst_ip
src_port
dst_port
seq_no
packet_size
timestamp
protocol
ack
fin
syn
urg
push
reset
content
22: 2
80: 2
15: 1
22
N
I
C
80 15 80 22
1 second
[1] “Enabling Real Time Data Analysis”, Divesh
Srivastava (AT&T Labs), et, al. Keynote talk, VLDB
2010. (a similar query is found in pp.15 of talk slide)
12
13. Example Query on Falcon (2/2)
• Access on each port ? [2]
• Outlier score for each port/sec
select dst_port,
cpd(dst_port)
from pkt[1 sec]
group by dst_port
g-cpd-pkt
src_ip
dst_ip
src_port
dst_port
seq_no
packet_size
timestamp
protocol
ack
fin
syn
urg
push
reset
content
22: 1.33
80: 2.44
15: 1.22
22
N
I
C
80 15 80 22
1 second
[2] “An Incident Analysis System NICTER and Its
Analysis Engines Based on Data Mining Techniques”,
Daisuke Inoue (NICT), et, al. ICONIP (1) 2008: 579-
586
13
14. Dividing CPD into 4 operators
Compute outiler score and
Moving average score
(omitting shwoing outlier score)
1st stage learning
Compute outiler score and
Moving average score
Input tx
2nd stage learning
Outlier scoreMoving average score
Probability provided by
2nd stage learning
Compute outiler score and
Moving average ascore
Input time series
data
Probability provided by
1st stage learning
14
15. Problem of CPD: Parameter setting
• CPD requires 6 parameters (𝛼 𝑅, 𝛼 𝐾, 𝛼 𝑇, 𝛽 𝑅, 𝛽 𝐾 , 𝛽 𝑇)
• Appropriate parameter setting is necessary … but it is difficult
– Blue: # accesses, Red: CPD score
Using appropriate parameter set Using inappropriate parameter set
15
16. Parameterset
2
A simple way for parameter tuning:
---Multiple CPDs with different parameter sets---
Input packet
Compute outiler score
1st stage learning
Compute outiler score
2nd stage learning
Compute outiler score
1st stage learning
Compute outiler score
2nd stage learning
Result aggregation
(e.g. majority voting)
Parameterset
3
Parameterset
4
Parameterset
0k
Issue: How to accelerate multiple CPD executions ?
Approach: Multiple query optimization
16
17. The 4 sharing patterns
-- Only branch cases, not merge --
Compute
outiler
score
1st stage
learning
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
1st stage
learning
2nd stage
learning
Compute
outiler
score
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
1st stage
learning
Compute
outiler
score
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
1st stage
learning
Compute
outiler
score
Compute
outiler
score
2nd stage
learning
NOTE: “1st stage learning” and “3rd stage learning” can be divided to sub operators, and a part of
sub operators can also be shared. The sharing patterns are described in the paper.
Pattern 1: Sharing CPD-1 if α_R and α_K are the same.
Pattern 2: Sharing CPD-1, 2 if α_R, α_K and α_T are the same.
Pattern 3: Sharing CPD-1, 2, 3 if α_R, α_K, α_T, β_R and β_K are the same.
Pattern 4: Sharing CPD-1, 2, 3, 4 if α_R, α_K, α_T, β_R, β_K and β_T are the same.
Pattern 1 Pattern 2 Pattern 3 Pattern 4
17
18. Experiment
Measuring execution time when sharing ONLY
1st stage learning
– Implement CPD by C++ and eigen library (for
matrix manipulation).
– Measured execution time using the CPD.
18
20. 3 Research Topics
• Falcon
– In-DSMS analytics system
• Window Operators acclerated by FPGA
– Join operator over data streams
• Crypt stream
– Privacy preserving stream data processing
20
21. A Novel Architecture of
Merging Network
for Handshake Join
Joint work with
Yasin OGE, Takefumi MIYOSHI, Tsutomu YOSHINAGA
Presented at
ICNC’11, FPL’11, MCSoC’12, SSDBM’13.
22. Falcon
- UDP-RX
- Window join (64-cores)
Performance
Monitor
22
FPGA instead of Data Center
Demo System at SSDBM’13
52. Falcon
- UDP-RX
- Window join (64-cores)
Performance
Monitor
52
Basic: 6.7 millions of tuples per second
Proposal: 14.6 millions of tuples per second
53. Wire-Speed Implementation of
Sliding-Window Aggregate Operator
over Out-of-Order Data Streams
Int’l Symposium on Embedded Multicore/Many-core SoCs
Sept. 26– 28, 2013, Tokyo
Yasin Oge∗, Masato Yoshimi ∗, Takefumi Miyoshi†, Hideyuki Kawashima‡,
Hidetsugu Irie ∗, and Tsutomu Yoshinaga∗
∗Univ. of Electro-Communications
†e-trees.Japan, Inc.
‡Univ. of Tsukuba
54. 3 Research Topics
• Falcon
– In-DSMS analytics system
• Window Operators
– Join operator over data streams
• CryptStream
– Privacy preserving stream data processing
54
55. A Security aware Stream Data
Processing Scheme on the Cloud
Joint work with
Katsuhiro TOMIYAMA
Introduced at
CloudDB’11
56. Related work:CryptDB
• CryptDB [R. A. Popa, et al. SOSP’11]
– Realizes relational operators over encrypted data
on traditional relational RDBMS
#Our research goal is to achieve it on an SPE
– Encrypts each value of the data stored in the DBMS
• Uses more than one type of encryption having different characteristics
⇒Three kinds of cipher are generated for each value of one plain value
Trusted area
Untrusted area
val
80
val-DET val-OPE val-HOM
DET(80) OPE(80) HOM(80)
Encrypted value
Plain value
Database
proxy
UDF
DECRYPT_RND
DECRYPT_DET
…
Client
DBMS
56
57. Ciphers of CryptDB/CryptStream
• DET (Deterministic)
– Be able to check the equality of two encrypted values.
– 𝑥 = 𝑦 ⇔ 𝐷𝐷𝐷 𝐾 𝑥 = 𝐷𝐷𝐷 𝐾 𝑦
• OPE (Order-preserving)
– Be able to check the inequality of two encrypted values.
– 𝑥 < 𝑦 ⇔ 𝑂𝑂𝑂 𝐾 𝑥 < 𝑂𝑂𝑂 𝐾 𝑦
• HOM (Homomorphic)
– Be able to execute addition operation over two
encrypted values.
– 𝐻𝐻𝐻 𝐾 𝑥 ∙ 𝐻𝐻𝐻 𝐾 𝑦 = 𝐻𝐻𝐻 𝐾(𝑥 + 𝑦)
57
59. • The side effect of encryption:Increase Tuple Size
– Generating three kinds of cipher
• Already Implemented onto Falcon
• Performance improvement is left on future work
– Partially resolved by our work at CloudDB’11.
id temp
1 32
id-DET id-OPE id-HOM
DET(1) OPE(1) HOM(1)
temp-DET temp-OPE temp-HOM
DET(32) OPE(32) HOM(32)
Encrypted stream data processing scheme
59
60. 3 Research Topics
• Falcon
– In-DSMS analytics system
• Window Operators
– Join operator over data streams
– Aggregate operator over data streams
• Crypt stream
– Privacy preserving stream data processing
60
These 3 researches are in progress...
63. Frontiers for Accelerators
SQL
Types of relational operator
are limited.
New techs are
created everyday !
63
NoSQL, Privacy, DM&ML
Blue ocean, many chances !Red ocean, but big return !
An FPGA memcached appliance
S. Chalamalasetti, et al, FPGA'13
Deep Learning with COTS HPC
A. Coates, et al, ICML’13
Netezza Exadata