7. What 5min. is for?
ISUCON
Our new service launches
Our services in troubles
13年11月30日土曜日
8. What we can do in
5min.?
Investigate logs! Logs! Logs!
Hot request paths
Heavy request paths
How many requests? How many users?
and, and, and ...
13年11月30日土曜日
9. Logs
Retrospection: past N min. logs
Inspection: logs now tailing
Prospection: incoming N min. logs
13年11月30日土曜日
10. Retrospection
in ISUCON
We MUST NOT be a slave of information.
Too many is worse.
We MUST know factors at least.
Too few is worse.
13年11月30日土曜日
11. analyze_apache_logs
Bundled with Apache::Log::Parser (in CPAN)
Read logs from STDIN, and analyze it
For each method/paths
HTTP response status code
Response duration (avg/min/max)
Query Strings / Referers (option)
13年11月30日土曜日
12. $ cat /var/log/httpd/access_log | analyze_apache_logs -s path
TOTAL: 1801
*! duration avg:97.33, min:76, max:110! tatus 200:3
s
/! duration avg:73517.00, min:6617, max:134667! status 200:6
/entry!
duration avg:168814.06, min:41780, max:378686! status 200:33
/entry/15035!duration avg:34386.00, min:34386, max:34386! status
200:1
/follow! duration avg:171574.81, min:4032, max:610354!status
200:145
/icon! duration avg:262889.95, min:117225, max:784451! status 200:21
/icon/
03df2637e15ff22eeb825d3aa664c2ecbf399cbc0257c94db002497d508a476c!
duration avg:292981.50, min:239181, max:346782! status 200:2
/icon/
06e3640fd416acffbbc63177bf5a65b9981de8dc3aae19ca9224fcf45c6fa1f6!
duration avg:270258.61, min:73933, max:492001! status 200:18
/icon/
09228075c09882cbf065a30848e79bdc3e43f7b43273be98304a5f7712aa37d8!
duration avg:198728.00, min:116202, max:271046! status 200:3
/icon/
0ab3a5827c926a148ef28d572e44a878a99ceecc11296025319f21826b77f352!
duration avg:250647.07, min:63798, max:503243! status 200:14
/icon/
0d5f799ba92380f94f6108521aacb50280da2a731a9d5fb19d6da1f224837a4a!
13年11月30日土曜日
13. Retrospection
in action
Shib: Hive WebUI -> mapreduce
ex: N min. logs of 10 mins ago
Import lag / MapReduce lag
Kibana: Elasticsearch WebUI
Scalability?
Fluentd + GrowthForecast
without on-demand queries
13年11月30日土曜日
18. Norikra(1):
Schema-less event stream:
Add/Remove data fields whenever you want
SQL:
No more restarts to add/remove queries
w/ JOINs, w/ SubQueries
w/ UDF
Truly Complex events:
Nested Hash/Array, accessible directly from SQL
13年11月30日土曜日
19. Norikra(2):
Open source software:
Licensed under GPLv2
Based on Esper
UDF plugins from rubygems.org
Ultra-fast bootstrap & small start:
3mins to install/start
1 server
13年11月30日土曜日
25. Norikra Queries: (3)
SELECT age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
GROUP BY age
13年11月30日土曜日
26. Norikra Queries: (3)
{“name”:”tagomoris”,
“age”:34, “address”:”Tokyo”,
“corp”:”LINE”, “current”:”Kyoto”}
SELECT age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
GROUP BY age
every 5 mins
{”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ...
13年11月30日土曜日
27. Norikra Queries: (4)
{“name”:”tagomoris”,
“age”:34, “address”:”Tokyo”,
“corp”:”LINE”, “current”:”Kyoto”}
SELECT age, COUNT(*) as cnt
FROM
events.win:time_batch(5 mins)
GROUP BY age
SELECT max(age) as max
FROM
events.win:time_batch(5 mins)
every 5 mins
{”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ...
{“max”:51}
13年11月30日土曜日
28. Norikra Queries: (5)
{“name”:”tagomoris”,
“user:{“age”:34, “corp”:”LINE”,
“address”:”Tokyo”},
“current”:”Kyoto”,
“speaker”:true,
“attend”:[true,true,false, ...]
}
SELECT age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
GROUP BY age
13年11月30日土曜日
29. Norikra Queries: (5)
{“name”:”tagomoris”,
“user:{“age”:34, “corp”:”LINE”,
“address”:”Tokyo”},
“current”:”Kyoto”,
“speaker”:true,
“attend”:[true,true,false, ...]
}
SELECT user.age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
GROUP BY user.age
13年11月30日土曜日
30. Norikra Queries: (5)
{“name”:”tagomoris”,
“user:{“age”:34, “corp”:”LINE”,
“address”:”Tokyo”},
“current”:”Kyoto”,
“speaker”:true,
“attend”:[true,true,false, ...]
}
SELECT user.age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
WHERE current=”Kyoto” AND attend.$0 AND attend.$1
GROUP BY user.age
13年11月30日土曜日
31. Before: Hive
EVERY HOUR!
SELECT
yyyymmdd, hh, campaign_id, region, lang,
count(*) AS click,
count(distinct member_id) AS uu
FROM (
SELECT
yyyymmdd,
hh,
get_json_object(log, '$.campaign.id') AS campaign_id,
get_json_object(log, '$.member.region') AS region,
get_json_object(log, '$.member.lang') AS lang,
get_json_object(log, '$.member.id') AS member_id
FROM applog
WHERE service='myservice'
AND yyyymmdd='20131101' AND hh='00'
AND get_json_object(log, '$.type') = 'click'
) x
GROUP BY yyyymmdd, hh, campaign_id, region, lang
13年11月30日土曜日
32. After: Norikra
SELECT
campaign.id AS campaign_id, member.region AS region,
count(*) AS click,
count(distinct member.id) AS uu
FROM myservice.win:time_batch(1 hours)
WHERE type="click"
GROUP BY campaign.id, member.region
13年11月30日土曜日
33. Before: Fluentd
EACH SERVICES
<match for.target.service>
type numeric_monitor
unit minute
tag service.response
output_key_prefix request_api
aggregate all
monitor_key api_response_time
percentiles 50,90,95,98,99
</match>
... AND RESTART OF FLUENTD!!!!!!!!!!!!!!
13年11月30日土曜日
35. Conclusion
Retrospections are important
We have many methods for retrospections now
Prospections are also important
For complex logs
For immediate reports
For less system managements
13年11月30日土曜日