SlideShare a Scribd company logo
1 of 63
Download to read offline
Taming
Big Data Streams
Hideyuki KAWASHIMA
Center for Computational Sciences
University of Tsukuba, Japan
STORM
Norikra
Jubatus
Relational-stream
XML-stream
S4
Puma
System S
MillWheel
Complex event processing
Machine learning
Incremental computation
Continual query
Spring
(DTW)
CPD
(Change
Point
Detection)
Window-aggregate
Window-join
FPGA GPU
SASE
Esper
Handshake-join
Incr.
LOCI
Online
LDA
Window
Tuple-stream
A Variety of Data Processing Techniques
Cayuga
Privacy Preservation CryptDB
Data mining
Kafka
MLBase
2
Privacy PreservationCryptDB GPGPU Intel MICFPGA TileraEncryption
Privacy Accelerator
ML&DM SQL NoSQL
Relational
stream
Norikra
Online
Data Mining &
Machine Learning
EsperPuma
Complex
event
processing
Jubatus
BorealisSystem S
Window
aggregate
SASE
Cayuga
Continual query & Window
S4
Window
join
CPD
Online
LDA
Tuple
stream
Kafka
STORM
MLBase
Incr.
LOCI
Spring
(DTW)
IBM Facebook
Line
Twitter
NTT & PFI UCB
Which Analytics Style ?
SQL style NoSQL style
Embedded operators
Filter, join, aggregation
User defined operators
Python, java, C++, …
Machine learning
Data mining
Machine learning
Data mining
4
Poor operators
High performance
Rich operators
Low performance
In-DB style
Embedded operators
Filter, join, aggregation
Machine learning
Data mining
Rich operators
High performance
Oracle-R MADLib
Relational
stream
Norikra
Online
Data Mining &
Machine Learning
EsperPuma
Complex
event
processing
Jubatus
BorealisSystem S
Window
aggregate
SASE
Cayuga
Continual query & Window
S4
Window
join
CPD
Online
LDA
Tuple
stream
Privacy PreservationCryptDB GPGPU Intel MICFPGA TileraEncryption
Privacy Accelerator
ML&DM SQL NoSQL
Kafka
STORM
MLBase
Incr.
LOCI
Spring
(DTW)
Falcon
5
MADLib
@UCB
Bismarck
@Stanford
Oracle-R
3 Research Topics
• Falcon
– In-DSMS analytics system
– Multiple query optimization for CPD
• Window Operators
– Join operator over data streams
• Crypt stream
– Privacy preserving stream data processing
6
A Multiple Query Optimization Scheme
for Change Point Detection on Falcon
Joint work with Masahiro OKE
Presented at BIRTE’13
SELECT COUNT(*)
FROM eth0[TIME 1 MIN]
WHERE port = 80
DSMS
Relation
eth0
・Destination IP
・Source IP
・Destination Port
・Source Port
・Interface (e.g. eth0)
・Length
・Version (e.g. IPV4 )
・Payload
Relational schema
20
Quick Review
Data Stream Management System (DSMS)
Q1
8
How many packets are
arrived for port 80
in a minute ?
• SQL is translated to operator tree.
• On arrival of data, tree is evaluated.
• Operators are based on relational database
– w(Window): Cutting off relations from a stream
– σ (Selection): Filter
– α (Aggregation): such as AVG, MIN, MAX
Query
Result
Users/Apps.
w σ αInput
adapter
Output
adapter
DSMS
Data
SELECT COUNT(*)
FROM eth0[TIME 1 MIN]
WHERE port = 80
9
A Target Application: Malware Detection
• Real datasets
– “Anti Malware engineering WorkShop 2013 (MWS
2013)”
– Extracted by NEGI proposed by Dr. Shinichi Isida.
• NICTER
– Keeps about 160,000 unused ip addresses (DARK NET)
• Packets to dark net are considered as attacks.
– Uses CPD (Change Point Detection) [1]) to detect
attacks such as DoS (denial of services).
[1] Daisuke Inoue, K. Yoshioka, M. Eto, Masaya Yamagata, Eisuke Nishino, Jun-ichi Takeuchi,
Kazuya Ohkouchi, Koji Nakao: An Incident Analysis System NICTER and Its Analysis Engines Based
on Data Mining Techniques. ICONIP (1) 2008: 579-586
[2] J. Takeuchi and K. Yamanishi, “A Unifying Framework for Detecting Outliers and Change
Points from Time Series,” IEEE TKDE, pp.482-492, 2006.
10
Relational data processing
Attack Detection
Discussion
?• Aggregates are good
CPD(AR)/ LOF / LDA/FIM
Yet Another DSMS: Falcon 11
Example Query on Falcon (1/2)
• #Access for each port ? [1]
• Group by aggregates
SELECT dst_port,
COUNT(dst_port)
FROM pkt[1 sec]
GROUP BY dst_port
g-pkt
src_ip
dst_ip
src_port
dst_port
seq_no
packet_size
timestamp
protocol
ack
fin
syn
urg
push
reset
content
22: 2
80: 2
15: 1
22
N
I
C
80 15 80 22
1 second
[1] “Enabling Real Time Data Analysis”, Divesh
Srivastava (AT&T Labs), et, al. Keynote talk, VLDB
2010. (a similar query is found in pp.15 of talk slide)
12
Example Query on Falcon (2/2)
• Access on each port ? [2]
• Outlier score for each port/sec
select dst_port,
cpd(dst_port)
from pkt[1 sec]
group by dst_port
g-cpd-pkt
src_ip
dst_ip
src_port
dst_port
seq_no
packet_size
timestamp
protocol
ack
fin
syn
urg
push
reset
content
22: 1.33
80: 2.44
15: 1.22
22
N
I
C
80 15 80 22
1 second
[2] “An Incident Analysis System NICTER and Its
Analysis Engines Based on Data Mining Techniques”,
Daisuke Inoue (NICT), et, al. ICONIP (1) 2008: 579-
586
13
Dividing CPD into 4 operators
Compute outiler score and
Moving average score
(omitting shwoing outlier score)
1st stage learning
Compute outiler score and
Moving average score
Input tx
2nd stage learning
Outlier scoreMoving average score
Probability provided by
2nd stage learning
Compute outiler score and
Moving average ascore
Input time series
data
Probability provided by
1st stage learning
14
Problem of CPD: Parameter setting
• CPD requires 6 parameters (𝛼 𝑅, 𝛼 𝐾, 𝛼 𝑇, 𝛽 𝑅, 𝛽 𝐾 , 𝛽 𝑇)
• Appropriate parameter setting is necessary … but it is difficult
– Blue: # accesses, Red: CPD score
Using appropriate parameter set Using inappropriate parameter set
15
Parameterset
2
A simple way for parameter tuning:
---Multiple CPDs with different parameter sets---
Input packet
Compute outiler score
1st stage learning
Compute outiler score
2nd stage learning
Compute outiler score
1st stage learning
Compute outiler score
2nd stage learning
Result aggregation
(e.g. majority voting)
Parameterset
3
Parameterset
4
Parameterset
0k
Issue: How to accelerate multiple CPD executions ?
Approach: Multiple query optimization
16
The 4 sharing patterns
-- Only branch cases, not merge --
Compute
outiler
score
1st stage
learning
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
1st stage
learning
2nd stage
learning
Compute
outiler
score
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
1st stage
learning
Compute
outiler
score
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
1st stage
learning
Compute
outiler
score
Compute
outiler
score
2nd stage
learning
NOTE: “1st stage learning” and “3rd stage learning” can be divided to sub operators, and a part of
sub operators can also be shared. The sharing patterns are described in the paper.
Pattern 1: Sharing CPD-1 if α_R and α_K are the same.
Pattern 2: Sharing CPD-1, 2 if α_R, α_K and α_T are the same.
Pattern 3: Sharing CPD-1, 2, 3 if α_R, α_K, α_T, β_R and β_K are the same.
Pattern 4: Sharing CPD-1, 2, 3, 4 if α_R, α_K, α_T, β_R, β_K and β_T are the same.
Pattern 1 Pattern 2 Pattern 3 Pattern 4
17
Experiment
Measuring execution time when sharing ONLY
1st stage learning
– Implement CPD by C++ and eigen library (for
matrix manipulation).
– Measured execution time using the CPD.
18
Exec. Time with Sharing 1st Stage Learning
ID
Parameters
Execution Time
(second)
Performance
Gain (times)
𝛼 𝑅 𝛼 𝐾 𝛼 𝑇 𝛽 𝑅 𝛽 𝐾 𝛽 𝑇 Naive
Shared
CPD-1
Shared
CPD-1
1 .02 2 5 .02 3 5 2.92 1.77 1.65
2 .02 4 5 .02 3 5 3.65 1.77 2.06
3 .02 2 5 .02 4 5 3.29 2.17 1.52
4 .005 2 5 .02 4 5 2.91 1.76 1.65
5 .02 2 5 .005 3 5 2.89 1.77 1.64
6 .02 2 7 .02 7 5 3.00 1.87 1.60
7 .02 1 5 .02 1 5 1.96 1.11 1.78
8 .02 10 10 .02 10 10 11.2 5.84 1.92
9 .02 1 10 .02 10 10 6.66 5.80 1.15
10 .02 10 10 .02 1 10 6.68 1.34 5.00
19
Parallel execution with accelerator is required.
3 Research Topics
• Falcon
– In-DSMS analytics system
• Window Operators acclerated by FPGA
– Join operator over data streams
• Crypt stream
– Privacy preserving stream data processing
20
A Novel Architecture of
Merging Network
for Handshake Join
Joint work with
Yasin OGE, Takefumi MIYOSHI, Tsutomu YOSHINAGA
Presented at
ICNC’11, FPL’11, MCSoC’12, SSDBM’13.
Falcon
- UDP-RX
- Window join (64-cores)
Performance
Monitor
22
FPGA instead of Data Center
Demo System at SSDBM’13
Example|Simple Continuous Query
S1.key = S2.key
S1 [Rows 100], S2 [Rows 100]
*SELECT
FROM
WHERE
23
key
value
key
value
tuple
S1.key = S2.keyWHERE
Window Operators
*SELECT
]]S1 [Rows 100], S2 [Rows 100]FROM
24
key
value
key
value
window
w
w
S1 [Rows 100], S2 [Rows 100]FROM
Join Operator
*SELECT
S1.key = S2.keyWHERE
25
w
w
key
value
key
value
join
*SELECT
S1 [Rows 100], S2 [Rows 100]FROM
S1.key = S2.keyWHERE
Overall Query Plan
26
w
w
key
value
key
value
HANDSHAKE JOIN
J. Teubner and R. Müller, SIGMOD’11
Handshake Join
basic idea
advantage
streams flow in opposite direction
highly parallel evaluation
input
stream 1
input
stream 2
window
28
Handshake Join
29
window of stream 2
old
tuple
new
tuple
new
tuple
old
tuple
window of stream 1
Handshake Join|Parallelization
30
Processor1 Processor2
divided into two sub-windows
Handshake Join|Parallelization
31
Processor1 Processor2 Processor3
divided into three sub-windows
Naïve
IMPLEMENTATION
Baseline Implementation
Join Core Join Core Join Core Join Core
33
dedicated processor
for join operation
Baseline Implementation
Join Core Join Core Join Core Join Core
buffer
34
results are stored in these buffers
Baseline Implementation
Merger
MergerMerger
Join Core Join Core Join Core Join Core
35
buffer
MergingNetwork
Output Data Flow
Merger
MergerMerger
Join Core Join Core Join Core Join Core
36
3
2
1
Merger
MergerMerger
Join Core Join Core Join Core
Issue|Inefficient Buffer Utilization
37
Join Core
3
2
1
Merger
MergerMerger
Join Core Join Core Join Core
Issue|Inefficient Buffer Utilization
38
Join Core
Merger
MergerMerger
Join Core Join Core Join Core
Issue|Inefficient Buffer Utilization
39
Join Core
Merger
MergerMerger
Join Core Join Core Join Core
Issue|Inefficient Buffer Utilization
40
Join Core
in case of 64cores
only 5% of usage
PROPOSED
IMPLEMENTATION
Adaptive Merging Network
Proposed Implementation
Join Core Join Core Join Core Join Core
42
Proposed Implementation
Join Core Join Core Join Core Join Core
43
Here, no buffering is required!
NO
buffer
Proposed Implementation
Ring
Node
Ring
Node
Ring
Node
Ring
Node
Join Core Join Core Join Core Join Core
44
Proposed Implementation
Ring
Node
Ring
Node
Ring
Node
Ring
Node
Join Core Join Core Join Core Join Core
45
buffer
Now, results are stored in these buffers!
Proposed Implementation
Merger
MergerMerger
Ring
Node
Ring
Node
Ring
Node
Ring
Node
Join Core Join Core Join Core Join Core
46
NO
buffer
Output Data Flow
47
Merger
MergerMerger
Ring
Node
Ring
Node
Ring
Node
Ring
Node
Join Core Join Core Join Core Join Core
1
48
Merger
MergerMerger
Ring
Node
Ring
Node
Ring
Node
Ring
Node
Join Core Join Core Join Core Join Core
2
Merger
MergerMerger
Ring
Node
Ring
Node
Ring
Node
Ring
Node
Join Core Join Core Join Core Join Core 49
21 3 4
Merger
MergerMerger
Ring
Node
Ring
Node
Ring
Node
Ring
Node
Join Core Join Core Join Core Join Core 50
21 3 4up to 100%
utilization
Performance|Proposed (64 join cores)
0
0.5
1
1.5
2
2.5
3
3.5
10 20 30 40 50 60 70 80 90 100
スループット[100万タプル/秒]
Match rate [%]
Throughput[1Mtuples/sec]
51
nested loop
baseline
proposed
Falcon
- UDP-RX
- Window join (64-cores)
Performance
Monitor
52
Basic: 6.7 millions of tuples per second
Proposal: 14.6 millions of tuples per second
Wire-Speed Implementation of
Sliding-Window Aggregate Operator
over Out-of-Order Data Streams
Int’l Symposium on Embedded Multicore/Many-core SoCs
Sept. 26– 28, 2013, Tokyo
Yasin Oge∗, Masato Yoshimi ∗, Takefumi Miyoshi†, Hideyuki Kawashima‡,
Hidetsugu Irie ∗, and Tsutomu Yoshinaga∗
∗Univ. of Electro-Communications
†e-trees.Japan, Inc.
‡Univ. of Tsukuba
3 Research Topics
• Falcon
– In-DSMS analytics system
• Window Operators
– Join operator over data streams
• CryptStream
– Privacy preserving stream data processing
54
A Security aware Stream Data
Processing Scheme on the Cloud
Joint work with
Katsuhiro TOMIYAMA
Introduced at
CloudDB’11
Related work:CryptDB
• CryptDB [R. A. Popa, et al. SOSP’11]
– Realizes relational operators over encrypted data
on traditional relational RDBMS
#Our research goal is to achieve it on an SPE
– Encrypts each value of the data stored in the DBMS
• Uses more than one type of encryption having different characteristics
⇒Three kinds of cipher are generated for each value of one plain value
Trusted area
Untrusted area
val
80
val-DET val-OPE val-HOM
DET(80) OPE(80) HOM(80)
Encrypted value
Plain value
Database
proxy
UDF
DECRYPT_RND
DECRYPT_DET
…
Client
DBMS
56
Ciphers of CryptDB/CryptStream
• DET (Deterministic)
– Be able to check the equality of two encrypted values.
– 𝑥 = 𝑦 ⇔ 𝐷𝐷𝐷 𝐾 𝑥 = 𝐷𝐷𝐷 𝐾 𝑦
• OPE (Order-preserving)
– Be able to check the inequality of two encrypted values.
– 𝑥 < 𝑦 ⇔ 𝑂𝑂𝑂 𝐾 𝑥 < 𝑂𝑂𝑂 𝐾 𝑦
• HOM (Homomorphic)
– Be able to execute addition operation over two
encrypted values.
– 𝐻𝐻𝐻 𝐾 𝑥 ∙ 𝐻𝐻𝐻 𝐾 𝑦 = 𝐻𝐻𝐻 𝐾(𝑥 + 𝑦)
57
Scheme of CryptStream
Encryption
module
Encryption
module
Encryption
module
Trusted area Trusted area Trusted area
Public cloud
(Untrusted area)
Trusted area Decryption
module
ResultResult
Stream processing engine
2012/10/29
Client
• The side effect of encryption:Increase Tuple Size
– Generating three kinds of cipher
• Already Implemented onto Falcon
• Performance improvement is left on future work
– Partially resolved by our work at CloudDB’11.
id temp
1 32
id-DET id-OPE id-HOM
DET(1) OPE(1) HOM(1)
temp-DET temp-OPE temp-HOM
DET(32) OPE(32) HOM(32)
Encrypted stream data processing scheme
59
3 Research Topics
• Falcon
– In-DSMS analytics system
• Window Operators
– Join operator over data streams
– Aggregate operator over data streams
• Crypt stream
– Privacy preserving stream data processing
60
These 3 researches are in progress...
Summary
Relational
stream
Norikra
Online
Data Mining &
Machine Learning
EsperPuma
Complex
event
processing
Jubatus
MillWheelSystem S
Window
aggregate
SASE
Cayuga
Continual query & Window
S4
Window
join
CPD
Online
LDA
Tuple
stream
Privacy PreservationCryptDB GPGPU Intel MICFPGA TileraEncryption
Privacy Accelerator
ML&DM SQL NoSQL
Kafka
STORM
MLBase
Incr.
LOCI
Spring
(DTW)
62
Frontiers for Accelerators
SQL
Types of relational operator
are limited.
New techs are
created everyday !
63
NoSQL, Privacy, DM&ML
Blue ocean, many chances !Red ocean, but big return !
An FPGA memcached appliance
S. Chalamalasetti, et al, FPGA'13
Deep Learning with COTS HPC
A. Coates, et al, ICML’13
Netezza Exadata

More Related Content

Viewers also liked

Changing patterns of agricultural growth & investment in Africa
Changing patterns of agricultural growth & investment in AfricaChanging patterns of agricultural growth & investment in Africa
Changing patterns of agricultural growth & investment in Africafutureagricultures
 
Learn OpenStack from trystack.cn
Learn OpenStack from trystack.cnLearn OpenStack from trystack.cn
Learn OpenStack from trystack.cnOpenCity Community
 
Anexo 12 pc 66707 alimentação (1)
Anexo 12   pc 66707 alimentação (1)Anexo 12   pc 66707 alimentação (1)
Anexo 12 pc 66707 alimentação (1)Miguel Rosario
 
All inclusive логистика для интернет магазина
All inclusive логистика для интернет магазинаAll inclusive логистика для интернет магазина
All inclusive логистика для интернет магазинаTOCHKA
 
Penjelasan lengkap tentang Audio
Penjelasan lengkap tentang AudioPenjelasan lengkap tentang Audio
Penjelasan lengkap tentang AudioRobby Firmansyah
 
子どもは窮屈に、大人はノビノビと-長野市新施設に疑問
子どもは窮屈に、大人はノビノビと-長野市新施設に疑問子どもは窮屈に、大人はノビノビと-長野市新施設に疑問
子どもは窮屈に、大人はノビノビと-長野市新施設に疑問長野市議会議員小泉一真
 
Python - پایتون
Python - پایتونPython - پایتون
Python - پایتونefazati
 
Implementing transparency and open government projects in Greece
Implementing transparency and open government projects in GreeceImplementing transparency and open government projects in Greece
Implementing transparency and open government projects in GreeceMichael Psallidas
 
Pesquisa Pew sobre o Brasil - Junho de 2014
Pesquisa Pew sobre o Brasil - Junho de 2014Pesquisa Pew sobre o Brasil - Junho de 2014
Pesquisa Pew sobre o Brasil - Junho de 2014Miguel Rosario
 
Economía de Bizkaia: Encuesta de Coyuntura Industrial - Nov. - Dic. 2015
Economía de Bizkaia: Encuesta de Coyuntura Industrial - Nov. - Dic. 2015Economía de Bizkaia: Encuesta de Coyuntura Industrial - Nov. - Dic. 2015
Economía de Bizkaia: Encuesta de Coyuntura Industrial - Nov. - Dic. 2015Cámara de Comercio de Bilbao
 
Outland res. brochure 2014
Outland res. brochure 2014Outland res. brochure 2014
Outland res. brochure 2014Jessica Luth
 

Viewers also liked (20)

Changing patterns of agricultural growth & investment in Africa
Changing patterns of agricultural growth & investment in AfricaChanging patterns of agricultural growth & investment in Africa
Changing patterns of agricultural growth & investment in Africa
 
Connecor Brochure (English)
Connecor Brochure (English)Connecor Brochure (English)
Connecor Brochure (English)
 
Learn OpenStack from trystack.cn
Learn OpenStack from trystack.cnLearn OpenStack from trystack.cn
Learn OpenStack from trystack.cn
 
Anexo 12 pc 66707 alimentação (1)
Anexo 12   pc 66707 alimentação (1)Anexo 12   pc 66707 alimentação (1)
Anexo 12 pc 66707 alimentação (1)
 
Parking
ParkingParking
Parking
 
All inclusive логистика для интернет магазина
All inclusive логистика для интернет магазинаAll inclusive логистика для интернет магазина
All inclusive логистика для интернет магазина
 
Penjelasan lengkap tentang Audio
Penjelasan lengkap tentang AudioPenjelasan lengkap tentang Audio
Penjelasan lengkap tentang Audio
 
子どもは窮屈に、大人はノビノビと-長野市新施設に疑問
子どもは窮屈に、大人はノビノビと-長野市新施設に疑問子どもは窮屈に、大人はノビノビと-長野市新施設に疑問
子どもは窮屈に、大人はノビノビと-長野市新施設に疑問
 
Practica27092013
Practica27092013Practica27092013
Practica27092013
 
Slavernij Linked Open Data
Slavernij Linked Open DataSlavernij Linked Open Data
Slavernij Linked Open Data
 
PMP Course by ECC Team at EITACIES INC
PMP Course by ECC Team at EITACIES INCPMP Course by ECC Team at EITACIES INC
PMP Course by ECC Team at EITACIES INC
 
Notam 05 fev 15
Notam 05 fev 15Notam 05 fev 15
Notam 05 fev 15
 
Python - پایتون
Python - پایتونPython - پایتون
Python - پایتون
 
Evaluation of final images pp
Evaluation of final images ppEvaluation of final images pp
Evaluation of final images pp
 
Implementing transparency and open government projects in Greece
Implementing transparency and open government projects in GreeceImplementing transparency and open government projects in Greece
Implementing transparency and open government projects in Greece
 
Notam 01-01-17
Notam 01-01-17Notam 01-01-17
Notam 01-01-17
 
Pesquisa Pew sobre o Brasil - Junho de 2014
Pesquisa Pew sobre o Brasil - Junho de 2014Pesquisa Pew sobre o Brasil - Junho de 2014
Pesquisa Pew sobre o Brasil - Junho de 2014
 
Традо питание по дошам
Традо питание по дошамТрадо питание по дошам
Традо питание по дошам
 
Economía de Bizkaia: Encuesta de Coyuntura Industrial - Nov. - Dic. 2015
Economía de Bizkaia: Encuesta de Coyuntura Industrial - Nov. - Dic. 2015Economía de Bizkaia: Encuesta de Coyuntura Industrial - Nov. - Dic. 2015
Economía de Bizkaia: Encuesta de Coyuntura Industrial - Nov. - Dic. 2015
 
Outland res. brochure 2014
Outland res. brochure 2014Outland res. brochure 2014
Outland res. brochure 2014
 

Similar to MCSoC'13 Keynote Talk "Taming Big Data Streams"

Closed-Loop Platform Automation by Tong Zhong and Emma Collins
Closed-Loop Platform Automation by Tong Zhong and Emma CollinsClosed-Loop Platform Automation by Tong Zhong and Emma Collins
Closed-Loop Platform Automation by Tong Zhong and Emma CollinsLiz Warner
 
Closed Loop Platform Automation - Tong Zhong & Emma Collins
Closed Loop Platform Automation - Tong Zhong & Emma CollinsClosed Loop Platform Automation - Tong Zhong & Emma Collins
Closed Loop Platform Automation - Tong Zhong & Emma CollinsLiz Warner
 
Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent MonitoringIntelie
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...NECST Lab @ Politecnico di Milano
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingMichelle Holley
 
Advanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Advanced Open IoT Platform for Prevention and Early Detection of Forest FiresAdvanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Advanced Open IoT Platform for Prevention and Early Detection of Forest FiresIvo Andreev
 
Better Network Management Through Network Programmability
Better Network Management Through Network ProgrammabilityBetter Network Management Through Network Programmability
Better Network Management Through Network ProgrammabilityCisco Canada
 
Computer Architecture and Organization
Computer Architecture and OrganizationComputer Architecture and Organization
Computer Architecture and Organizationssuserdfc773
 
Mastering AIOps with Deep Learning
Mastering AIOps with Deep LearningMastering AIOps with Deep Learning
Mastering AIOps with Deep LearningJorge Cardoso
 
System Architecture Exploration Training Class
System Architecture Exploration Training ClassSystem Architecture Exploration Training Class
System Architecture Exploration Training ClassDeepak Shankar
 
Webinar september 2013
Webinar september 2013Webinar september 2013
Webinar september 2013Marc Gille
 
Introduction to Programmable Networks by Clarence Anslem, Intel
Introduction to Programmable Networks by Clarence Anslem, IntelIntroduction to Programmable Networks by Clarence Anslem, Intel
Introduction to Programmable Networks by Clarence Anslem, IntelMyNOG
 
Splunk App for Stream
Splunk App for StreamSplunk App for Stream
Splunk App for StreamSplunk
 
Boosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesBoosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesAhsan Javed Awan
 
Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...IndicThreads
 
Introduction to architecture exploration
Introduction to architecture explorationIntroduction to architecture exploration
Introduction to architecture explorationDeepak Shankar
 

Similar to MCSoC'13 Keynote Talk "Taming Big Data Streams" (20)

No[1][1]
No[1][1]No[1][1]
No[1][1]
 
Closed-Loop Platform Automation by Tong Zhong and Emma Collins
Closed-Loop Platform Automation by Tong Zhong and Emma CollinsClosed-Loop Platform Automation by Tong Zhong and Emma Collins
Closed-Loop Platform Automation by Tong Zhong and Emma Collins
 
Closed Loop Platform Automation - Tong Zhong & Emma Collins
Closed Loop Platform Automation - Tong Zhong & Emma CollinsClosed Loop Platform Automation - Tong Zhong & Emma Collins
Closed Loop Platform Automation - Tong Zhong & Emma Collins
 
Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent Monitoring
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
 
SPC in solar industry
SPC in solar industry SPC in solar industry
SPC in solar industry
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
 
Advanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Advanced Open IoT Platform for Prevention and Early Detection of Forest FiresAdvanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Advanced Open IoT Platform for Prevention and Early Detection of Forest Fires
 
BIRTE-13-Kawashima
BIRTE-13-KawashimaBIRTE-13-Kawashima
BIRTE-13-Kawashima
 
Better Network Management Through Network Programmability
Better Network Management Through Network ProgrammabilityBetter Network Management Through Network Programmability
Better Network Management Through Network Programmability
 
Computer Architecture and Organization
Computer Architecture and OrganizationComputer Architecture and Organization
Computer Architecture and Organization
 
Mastering AIOps with Deep Learning
Mastering AIOps with Deep LearningMastering AIOps with Deep Learning
Mastering AIOps with Deep Learning
 
System Architecture Exploration Training Class
System Architecture Exploration Training ClassSystem Architecture Exploration Training Class
System Architecture Exploration Training Class
 
PoC Oracle Exadata - Retour d'expérience
PoC Oracle Exadata - Retour d'expériencePoC Oracle Exadata - Retour d'expérience
PoC Oracle Exadata - Retour d'expérience
 
Webinar september 2013
Webinar september 2013Webinar september 2013
Webinar september 2013
 
Introduction to Programmable Networks by Clarence Anslem, Intel
Introduction to Programmable Networks by Clarence Anslem, IntelIntroduction to Programmable Networks by Clarence Anslem, Intel
Introduction to Programmable Networks by Clarence Anslem, Intel
 
Splunk App for Stream
Splunk App for StreamSplunk App for Stream
Splunk App for Stream
 
Boosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesBoosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of Techniques
 
Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...
 
Introduction to architecture exploration
Introduction to architecture explorationIntroduction to architecture exploration
Introduction to architecture exploration
 

Recently uploaded

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Recently uploaded (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

MCSoC'13 Keynote Talk "Taming Big Data Streams"

  • 1. Taming Big Data Streams Hideyuki KAWASHIMA Center for Computational Sciences University of Tsukuba, Japan
  • 2. STORM Norikra Jubatus Relational-stream XML-stream S4 Puma System S MillWheel Complex event processing Machine learning Incremental computation Continual query Spring (DTW) CPD (Change Point Detection) Window-aggregate Window-join FPGA GPU SASE Esper Handshake-join Incr. LOCI Online LDA Window Tuple-stream A Variety of Data Processing Techniques Cayuga Privacy Preservation CryptDB Data mining Kafka MLBase 2
  • 3. Privacy PreservationCryptDB GPGPU Intel MICFPGA TileraEncryption Privacy Accelerator ML&DM SQL NoSQL Relational stream Norikra Online Data Mining & Machine Learning EsperPuma Complex event processing Jubatus BorealisSystem S Window aggregate SASE Cayuga Continual query & Window S4 Window join CPD Online LDA Tuple stream Kafka STORM MLBase Incr. LOCI Spring (DTW) IBM Facebook Line Twitter NTT & PFI UCB
  • 4. Which Analytics Style ? SQL style NoSQL style Embedded operators Filter, join, aggregation User defined operators Python, java, C++, … Machine learning Data mining Machine learning Data mining 4 Poor operators High performance Rich operators Low performance In-DB style Embedded operators Filter, join, aggregation Machine learning Data mining Rich operators High performance Oracle-R MADLib
  • 5. Relational stream Norikra Online Data Mining & Machine Learning EsperPuma Complex event processing Jubatus BorealisSystem S Window aggregate SASE Cayuga Continual query & Window S4 Window join CPD Online LDA Tuple stream Privacy PreservationCryptDB GPGPU Intel MICFPGA TileraEncryption Privacy Accelerator ML&DM SQL NoSQL Kafka STORM MLBase Incr. LOCI Spring (DTW) Falcon 5 MADLib @UCB Bismarck @Stanford Oracle-R
  • 6. 3 Research Topics • Falcon – In-DSMS analytics system – Multiple query optimization for CPD • Window Operators – Join operator over data streams • Crypt stream – Privacy preserving stream data processing 6
  • 7. A Multiple Query Optimization Scheme for Change Point Detection on Falcon Joint work with Masahiro OKE Presented at BIRTE’13
  • 8. SELECT COUNT(*) FROM eth0[TIME 1 MIN] WHERE port = 80 DSMS Relation eth0 ・Destination IP ・Source IP ・Destination Port ・Source Port ・Interface (e.g. eth0) ・Length ・Version (e.g. IPV4 ) ・Payload Relational schema 20 Quick Review Data Stream Management System (DSMS) Q1 8 How many packets are arrived for port 80 in a minute ?
  • 9. • SQL is translated to operator tree. • On arrival of data, tree is evaluated. • Operators are based on relational database – w(Window): Cutting off relations from a stream – σ (Selection): Filter – α (Aggregation): such as AVG, MIN, MAX Query Result Users/Apps. w σ αInput adapter Output adapter DSMS Data SELECT COUNT(*) FROM eth0[TIME 1 MIN] WHERE port = 80 9
  • 10. A Target Application: Malware Detection • Real datasets – “Anti Malware engineering WorkShop 2013 (MWS 2013)” – Extracted by NEGI proposed by Dr. Shinichi Isida. • NICTER – Keeps about 160,000 unused ip addresses (DARK NET) • Packets to dark net are considered as attacks. – Uses CPD (Change Point Detection) [1]) to detect attacks such as DoS (denial of services). [1] Daisuke Inoue, K. Yoshioka, M. Eto, Masaya Yamagata, Eisuke Nishino, Jun-ichi Takeuchi, Kazuya Ohkouchi, Koji Nakao: An Incident Analysis System NICTER and Its Analysis Engines Based on Data Mining Techniques. ICONIP (1) 2008: 579-586 [2] J. Takeuchi and K. Yamanishi, “A Unifying Framework for Detecting Outliers and Change Points from Time Series,” IEEE TKDE, pp.482-492, 2006. 10
  • 11. Relational data processing Attack Detection Discussion ?• Aggregates are good CPD(AR)/ LOF / LDA/FIM Yet Another DSMS: Falcon 11
  • 12. Example Query on Falcon (1/2) • #Access for each port ? [1] • Group by aggregates SELECT dst_port, COUNT(dst_port) FROM pkt[1 sec] GROUP BY dst_port g-pkt src_ip dst_ip src_port dst_port seq_no packet_size timestamp protocol ack fin syn urg push reset content 22: 2 80: 2 15: 1 22 N I C 80 15 80 22 1 second [1] “Enabling Real Time Data Analysis”, Divesh Srivastava (AT&T Labs), et, al. Keynote talk, VLDB 2010. (a similar query is found in pp.15 of talk slide) 12
  • 13. Example Query on Falcon (2/2) • Access on each port ? [2] • Outlier score for each port/sec select dst_port, cpd(dst_port) from pkt[1 sec] group by dst_port g-cpd-pkt src_ip dst_ip src_port dst_port seq_no packet_size timestamp protocol ack fin syn urg push reset content 22: 1.33 80: 2.44 15: 1.22 22 N I C 80 15 80 22 1 second [2] “An Incident Analysis System NICTER and Its Analysis Engines Based on Data Mining Techniques”, Daisuke Inoue (NICT), et, al. ICONIP (1) 2008: 579- 586 13
  • 14. Dividing CPD into 4 operators Compute outiler score and Moving average score (omitting shwoing outlier score) 1st stage learning Compute outiler score and Moving average score Input tx 2nd stage learning Outlier scoreMoving average score Probability provided by 2nd stage learning Compute outiler score and Moving average ascore Input time series data Probability provided by 1st stage learning 14
  • 15. Problem of CPD: Parameter setting • CPD requires 6 parameters (𝛼 𝑅, 𝛼 𝐾, 𝛼 𝑇, 𝛽 𝑅, 𝛽 𝐾 , 𝛽 𝑇) • Appropriate parameter setting is necessary … but it is difficult – Blue: # accesses, Red: CPD score Using appropriate parameter set Using inappropriate parameter set 15
  • 16. Parameterset 2 A simple way for parameter tuning: ---Multiple CPDs with different parameter sets--- Input packet Compute outiler score 1st stage learning Compute outiler score 2nd stage learning Compute outiler score 1st stage learning Compute outiler score 2nd stage learning Result aggregation (e.g. majority voting) Parameterset 3 Parameterset 4 Parameterset 0k Issue: How to accelerate multiple CPD executions ? Approach: Multiple query optimization 16
  • 17. The 4 sharing patterns -- Only branch cases, not merge -- Compute outiler score 1st stage learning Compute outiler score 2nd stage learning Compute outiler score Compute outiler score 2nd stage learning Compute outiler score Compute outiler score 2nd stage learning Compute outiler score 1st stage learning 2nd stage learning Compute outiler score Compute outiler score 2nd stage learning Compute outiler score 2nd stage learning Compute outiler score 1st stage learning Compute outiler score Compute outiler score 2nd stage learning Compute outiler score 1st stage learning Compute outiler score Compute outiler score 2nd stage learning NOTE: “1st stage learning” and “3rd stage learning” can be divided to sub operators, and a part of sub operators can also be shared. The sharing patterns are described in the paper. Pattern 1: Sharing CPD-1 if α_R and α_K are the same. Pattern 2: Sharing CPD-1, 2 if α_R, α_K and α_T are the same. Pattern 3: Sharing CPD-1, 2, 3 if α_R, α_K, α_T, β_R and β_K are the same. Pattern 4: Sharing CPD-1, 2, 3, 4 if α_R, α_K, α_T, β_R, β_K and β_T are the same. Pattern 1 Pattern 2 Pattern 3 Pattern 4 17
  • 18. Experiment Measuring execution time when sharing ONLY 1st stage learning – Implement CPD by C++ and eigen library (for matrix manipulation). – Measured execution time using the CPD. 18
  • 19. Exec. Time with Sharing 1st Stage Learning ID Parameters Execution Time (second) Performance Gain (times) 𝛼 𝑅 𝛼 𝐾 𝛼 𝑇 𝛽 𝑅 𝛽 𝐾 𝛽 𝑇 Naive Shared CPD-1 Shared CPD-1 1 .02 2 5 .02 3 5 2.92 1.77 1.65 2 .02 4 5 .02 3 5 3.65 1.77 2.06 3 .02 2 5 .02 4 5 3.29 2.17 1.52 4 .005 2 5 .02 4 5 2.91 1.76 1.65 5 .02 2 5 .005 3 5 2.89 1.77 1.64 6 .02 2 7 .02 7 5 3.00 1.87 1.60 7 .02 1 5 .02 1 5 1.96 1.11 1.78 8 .02 10 10 .02 10 10 11.2 5.84 1.92 9 .02 1 10 .02 10 10 6.66 5.80 1.15 10 .02 10 10 .02 1 10 6.68 1.34 5.00 19 Parallel execution with accelerator is required.
  • 20. 3 Research Topics • Falcon – In-DSMS analytics system • Window Operators acclerated by FPGA – Join operator over data streams • Crypt stream – Privacy preserving stream data processing 20
  • 21. A Novel Architecture of Merging Network for Handshake Join Joint work with Yasin OGE, Takefumi MIYOSHI, Tsutomu YOSHINAGA Presented at ICNC’11, FPL’11, MCSoC’12, SSDBM’13.
  • 22. Falcon - UDP-RX - Window join (64-cores) Performance Monitor 22 FPGA instead of Data Center Demo System at SSDBM’13
  • 23. Example|Simple Continuous Query S1.key = S2.key S1 [Rows 100], S2 [Rows 100] *SELECT FROM WHERE 23 key value key value tuple
  • 24. S1.key = S2.keyWHERE Window Operators *SELECT ]]S1 [Rows 100], S2 [Rows 100]FROM 24 key value key value window w w
  • 25. S1 [Rows 100], S2 [Rows 100]FROM Join Operator *SELECT S1.key = S2.keyWHERE 25 w w key value key value join
  • 26. *SELECT S1 [Rows 100], S2 [Rows 100]FROM S1.key = S2.keyWHERE Overall Query Plan 26 w w key value key value
  • 27. HANDSHAKE JOIN J. Teubner and R. Müller, SIGMOD’11
  • 28. Handshake Join basic idea advantage streams flow in opposite direction highly parallel evaluation input stream 1 input stream 2 window 28
  • 29. Handshake Join 29 window of stream 2 old tuple new tuple new tuple old tuple window of stream 1
  • 31. Handshake Join|Parallelization 31 Processor1 Processor2 Processor3 divided into three sub-windows
  • 33. Baseline Implementation Join Core Join Core Join Core Join Core 33 dedicated processor for join operation
  • 34. Baseline Implementation Join Core Join Core Join Core Join Core buffer 34 results are stored in these buffers
  • 35. Baseline Implementation Merger MergerMerger Join Core Join Core Join Core Join Core 35 buffer MergingNetwork
  • 36. Output Data Flow Merger MergerMerger Join Core Join Core Join Core Join Core 36 3 2 1
  • 37. Merger MergerMerger Join Core Join Core Join Core Issue|Inefficient Buffer Utilization 37 Join Core 3 2 1
  • 38. Merger MergerMerger Join Core Join Core Join Core Issue|Inefficient Buffer Utilization 38 Join Core
  • 39. Merger MergerMerger Join Core Join Core Join Core Issue|Inefficient Buffer Utilization 39 Join Core
  • 40. Merger MergerMerger Join Core Join Core Join Core Issue|Inefficient Buffer Utilization 40 Join Core in case of 64cores only 5% of usage
  • 42. Proposed Implementation Join Core Join Core Join Core Join Core 42
  • 43. Proposed Implementation Join Core Join Core Join Core Join Core 43 Here, no buffering is required! NO buffer
  • 45. Proposed Implementation Ring Node Ring Node Ring Node Ring Node Join Core Join Core Join Core Join Core 45 buffer Now, results are stored in these buffers!
  • 50. Merger MergerMerger Ring Node Ring Node Ring Node Ring Node Join Core Join Core Join Core Join Core 50 21 3 4up to 100% utilization
  • 51. Performance|Proposed (64 join cores) 0 0.5 1 1.5 2 2.5 3 3.5 10 20 30 40 50 60 70 80 90 100 スループット[100万タプル/秒] Match rate [%] Throughput[1Mtuples/sec] 51 nested loop baseline proposed
  • 52. Falcon - UDP-RX - Window join (64-cores) Performance Monitor 52 Basic: 6.7 millions of tuples per second Proposal: 14.6 millions of tuples per second
  • 53. Wire-Speed Implementation of Sliding-Window Aggregate Operator over Out-of-Order Data Streams Int’l Symposium on Embedded Multicore/Many-core SoCs Sept. 26– 28, 2013, Tokyo Yasin Oge∗, Masato Yoshimi ∗, Takefumi Miyoshi†, Hideyuki Kawashima‡, Hidetsugu Irie ∗, and Tsutomu Yoshinaga∗ ∗Univ. of Electro-Communications †e-trees.Japan, Inc. ‡Univ. of Tsukuba
  • 54. 3 Research Topics • Falcon – In-DSMS analytics system • Window Operators – Join operator over data streams • CryptStream – Privacy preserving stream data processing 54
  • 55. A Security aware Stream Data Processing Scheme on the Cloud Joint work with Katsuhiro TOMIYAMA Introduced at CloudDB’11
  • 56. Related work:CryptDB • CryptDB [R. A. Popa, et al. SOSP’11] – Realizes relational operators over encrypted data on traditional relational RDBMS #Our research goal is to achieve it on an SPE – Encrypts each value of the data stored in the DBMS • Uses more than one type of encryption having different characteristics ⇒Three kinds of cipher are generated for each value of one plain value Trusted area Untrusted area val 80 val-DET val-OPE val-HOM DET(80) OPE(80) HOM(80) Encrypted value Plain value Database proxy UDF DECRYPT_RND DECRYPT_DET … Client DBMS 56
  • 57. Ciphers of CryptDB/CryptStream • DET (Deterministic) – Be able to check the equality of two encrypted values. – 𝑥 = 𝑦 ⇔ 𝐷𝐷𝐷 𝐾 𝑥 = 𝐷𝐷𝐷 𝐾 𝑦 • OPE (Order-preserving) – Be able to check the inequality of two encrypted values. – 𝑥 < 𝑦 ⇔ 𝑂𝑂𝑂 𝐾 𝑥 < 𝑂𝑂𝑂 𝐾 𝑦 • HOM (Homomorphic) – Be able to execute addition operation over two encrypted values. – 𝐻𝐻𝐻 𝐾 𝑥 ∙ 𝐻𝐻𝐻 𝐾 𝑦 = 𝐻𝐻𝐻 𝐾(𝑥 + 𝑦) 57
  • 58. Scheme of CryptStream Encryption module Encryption module Encryption module Trusted area Trusted area Trusted area Public cloud (Untrusted area) Trusted area Decryption module ResultResult Stream processing engine 2012/10/29 Client
  • 59. • The side effect of encryption:Increase Tuple Size – Generating three kinds of cipher • Already Implemented onto Falcon • Performance improvement is left on future work – Partially resolved by our work at CloudDB’11. id temp 1 32 id-DET id-OPE id-HOM DET(1) OPE(1) HOM(1) temp-DET temp-OPE temp-HOM DET(32) OPE(32) HOM(32) Encrypted stream data processing scheme 59
  • 60. 3 Research Topics • Falcon – In-DSMS analytics system • Window Operators – Join operator over data streams – Aggregate operator over data streams • Crypt stream – Privacy preserving stream data processing 60 These 3 researches are in progress...
  • 62. Relational stream Norikra Online Data Mining & Machine Learning EsperPuma Complex event processing Jubatus MillWheelSystem S Window aggregate SASE Cayuga Continual query & Window S4 Window join CPD Online LDA Tuple stream Privacy PreservationCryptDB GPGPU Intel MICFPGA TileraEncryption Privacy Accelerator ML&DM SQL NoSQL Kafka STORM MLBase Incr. LOCI Spring (DTW) 62
  • 63. Frontiers for Accelerators SQL Types of relational operator are limited. New techs are created everyday ! 63 NoSQL, Privacy, DM&ML Blue ocean, many chances !Red ocean, but big return ! An FPGA memcached appliance S. Chalamalasetti, et al, FPGA'13 Deep Learning with COTS HPC A. Coates, et al, ICML’13 Netezza Exadata