8. Batch processing
RDBMS, Hadoop/Hive, ....
(transaction is out of this topic)
Target window: hours - weeks (or more)
Total throuput: HIGHEST
Query Latency: LARGEST (20sec - mins - hours)
14年7月9日水曜日
9. Stream processing
Storm, Esper, Norikra, Fluentd, ....
Kafka(?), Spark streaming(?)
Target window: seconds - hours
Total throughput: Normal
Query latency: SMALLEST (milliseconds)
Queries must be written BEFORE DATA
Once registered, runs forever
14年7月9日水曜日
10. Data flow and latency
data window
query execution
Batch Stream
incremental
query exection
14年7月9日水曜日
11. Query for stored data
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
table
At first, all data
MUST be stored.
14年7月9日水曜日
12. Query for stored data
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
SELECT v1,v2,COUNT(*)
FROM table
WHERE v3=’x’ GROUP BY v1,v2
table
14年7月9日水曜日
13. Query for stored data
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
SELECT v1,v2,COUNT(*)
FROM table
WHERE v3=’x’ GROUP BY v1,v2
table
SELECT v4,COUNT(*)
FROM table
WHERE v1 AND v2 GROUP BY v4
14年7月9日水曜日
14. Query for stored data
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
SELECT v1,v2,COUNT(*)
FROM table
WHERE v3=’x’ GROUP BY v1,v2
table
SELECT v4,COUNT(*)
FROM table
WHERE v1 AND v2 GROUP BY v4
“All data” means
“data that not be used”.
14年7月9日水曜日
15. Query for stream data
v1,v2,v3,v4,v5,v6
SELECT v1,v2,COUNT(*)
FROM table.win:xxx
WHERE v3=’x’ GROUP BY v1,v2
stream
SELECT v4,COUNT(*)
FROM table.win:xxx
WHERE v1 AND v2 GROUP BY v4
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
14年7月9日水曜日
16. Query for stream data
v1,v2,v3,v4,v5,v6
SELECT v1,v2,COUNT(*)
FROM table.win:xxx
WHERE v3=’x’ GROUP BY v1,v2
stream
SELECT v4,COUNT(*)
FROM table.win:xxx
WHERE v1 AND v2 GROUP BY v4
v1,v2,v3
v1,v2,v4
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
14年7月9日水曜日
17. v1,v2,v3,v4,v5,v6
Query for stream data
SELECT v1,v2,COUNT(*)
FROM table.win:xxx
WHERE v3=’x’ GROUP BY v1,v2
stream
SELECT v4,COUNT(*)
FROM table.win:xxx
WHERE v1 AND v2 GROUP BY v4
v1,v2,v3
v1,v2,v4v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
14年7月9日水曜日
18. v1,v2,v3,v4,v5,v6
Query for stream data
SELECT v1,v2,COUNT(*)
FROM table.win:xxx
WHERE v3=’x’ GROUP BY v1,v2
stream
SELECT v4,COUNT(*)
FROM table.win:xxx
WHERE v1 AND v2 GROUP BY v4
v1,v2,v3
v1,v2,v4
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
All data will be discarded
just after inserted.
(Bye-bye storage system maintenance!)
14年7月9日水曜日
19. Incremental calculation
v1,v2,v3,v4,v5,v6
SELECT v1,v2,COUNT(*)
FROM table.win:xxx
WHERE v3=’x’ GROUP BY v1,v2
stream
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
internal data (memory)
v1 v2 COUNT
TRUE TRUE 0
TRUE FALSE 1
FALSE TRUE 33
FALSE FALSE 2
14年7月9日水曜日
20. Incremental calculation
v1,v2,v3,v4,v5,v6
SELECT v1,v2,COUNT(*)
FROM table.win:xxx
WHERE v3=’x’ GROUP BY v1,v2
stream
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
internal data (memory)
v1 v2 COUNT
TRUE TRUE 1
TRUE FALSE 1
FALSE TRUE 33
FALSE FALSE 2
14年7月9日水曜日
21. Incremental calculation
v1,v2,v3,v4,v5,v6
SELECT v1,v2,COUNT(*)
FROM table.win:xxx
WHERE v3=’x’ GROUP BY v1,v2
stream
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
internal data (memory)
v1 v2 COUNT
TRUE TRUE 1
TRUE FALSE 1
FALSE TRUE 34
FALSE FALSE 2
14年7月9日水曜日
22. Incremental calculation
SELECT v1,v2,COUNT(*)
FROM table.win:xxx
WHERE v3=’x’ GROUP BY v1,v2
stream
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
v1,v2,v3,v4,v5,v6
internal data (memory)
v1 v2 COUNT
TRUE TRUE 1
TRUE FALSE 2
FALSE TRUE 37
FALSE FALSE 3
memory can store
internal data
14年7月9日水曜日
23. Data window
Target time (or size) range of queries
Batch (or short-batch)
FROM-TO: WHERE dt >= ‘2014-07-07 00:00:00‘
AND dt <= ‘2014-07-08 23:59:59’
Stream
“Calculate this query for every 3 minutes”
Extended SQL required SELECT v1,v2,COUNT(*)
FROM table.win:xxx
WHERE v3=’x’ GROUP BY v1,v2
14年7月9日水曜日
25. Stream processing with SQL
Esper: Java library to process Stream
Esper EPL
SELECT param1, param2
FROM tbl
WHERE age > 30
14年7月9日水曜日
26. Stream processing with SQL
SELECT param, COUNT(*) AS c
FROM tbl
WHERE age > 30
GROUP BY param
Esper: Java library to process Stream
Esper EPL
14年7月9日水曜日
27. Stream processing with SQL
SELECT param, COUNT(*) AS c
FROM tbl.win:time_batch(1 hour)
WHERE age > 30
GROUP BY param
Esper: Java library to process Stream
Esper EPL
14年7月9日水曜日
29. Norikra:
Schema-less Stream Processing with SQL
Server software, runs on JVM
Open source software (GPLv2)
http://norikra.github.io/
https://github.com/norikra/norikra
14年7月9日水曜日
30. Norikra:
Schema-less event stream:
Add/Remove data fields whenever you want
SQL:
No more restarts to add/remove queries
w/ JOINs, w/ SubQueries
w/ UDF (in Java/Ruby from rubygem)
Truly Complex events:
Nested Hash/Array, accessible directly from SQL
HTTP RPC w/ JSON or MessagePack (fluentd plugin available!)
14年7月9日水曜日
31. How to setup Norikra:
Install JRuby
download jruby.tar.gz, extract it and export $PATH
‘rbenv install jruby-1.7.xx’ & ‘rbenv shell jruby-..’
Install Norikra
‘gem install norikra’
Execute Norikra server
‘norikra start’
14年7月9日水曜日
32. Norikra Interface:
Command line: norikra-client
norikra-client target open ...
norikra-client query add ...
tail -f ... | norikra-client event send ...
WebUI
show status
show/add/remove queries
HTTP API
JSON, MessagePack
14年7月9日水曜日
39. Norikra Queries: (3)
SELECT age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
GROUP BY age
{”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ...
every 5 mins
{“name”:”tagomoris”,
“age”:34, “address”:”Tokyo”,
“corp”:”LINE”, “current”:”Tsukuba”}
14年7月9日水曜日
40. Norikra Queries: (4)
SELECT age, COUNT(*) as cnt
FROM
events.win:time_batch(5 mins)
GROUP BY age
{”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ...
SELECT max(age) as max
FROM
events.win:time_batch(5 mins)
{“max”:51}
{“name”:”tagomoris”,
“age”:34, “address”:”Tokyo”,
“corp”:”LINE”, “current”:”Tsukuba”}
every 5 mins
14年7月9日水曜日
41. Norikra Queries: (5)
SELECT age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
GROUP BY age
{“name”:”tagomoris”,
“user:{“age”:34, “corp”:”LINE”,
“address”:”Tokyo”},
“current”:”Tsukuba”,
“speaker”:true,
“attend”:[true,true,false, ...]
}
14年7月9日水曜日
42. Norikra Queries: (5)
SELECT user.age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
GROUP BY user.age
{“name”:”tagomoris”,
“user:{“age”:34, “corp”:”LINE”,
“address”:”Tokyo”},
“current”:”Tsukuba”,
“speaker”:true,
“attend”:[true,true,false, ...]
}
14年7月9日水曜日
43. Norikra Queries: (5)
SELECT user.age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
WHERE current=”Tsukuba”
AND attend.$0 AND attend.$1
GROUP BY user.age
{“name”:”tagomoris”,
“user:{“age”:34, “corp”:”LINE”,
“address”:”Tokyo”},
“current”:”Kyoto”,
“speaker”:true,
“attend”:[true,true,false, ...]
}
14年7月9日水曜日
44. Use cases in real world
Enjoy following sessions!
14年7月9日水曜日
46. See also:
http://norikra.github.io/
“Batch processing and Stream processing by SQL”
http://www.slideshare.net/tagomoris/hcj2014-sql
“Log analysis systems and its designs in LINE Corp 2014 Early”
http://www.slideshare.net/tagomoris/log-analysis-system-and-its-designs-in-
line-corp-2014-early
“Norikra in Action”
http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring
14年7月9日水曜日