Weitere ähnliche Inhalte Ähnlich wie Large Scale Log Analytics with Solr (from Lucene Revolution 2015) (20) Mehr von Sematext Group, Inc. (11) Kürzlich hochgeladen (20) Large Scale Log Analytics with Solr (from Lucene Revolution 2015)2. Large Scale Log Analytics with Solr
Rafał Kuć and Radu Gheorghe
Sematext Group
11. 11
01
Parse JSON
input {
file {
path => "/opt/logs/example.log.parsed"
start_position => "beginning"
…
filter {
json {
source => "message"
}
}
output {
solr_http {
…
apache combined logs in JSON
bin/logstash -f logstash.conf -w 4 # filterWorkers=4
13. input {
file {
path => "/opt/logs/example.log"
start_position => "beginning"
…
filter {
grok {
match => [ "message", "%{COMBINEDAPACHELOG}" ]
}
}
output {
solr_http {
…
13
01
Grok
21. 21
01
Flow in rsyslog
/var/log/apache.log
syslog socket
main queue (RAM+Disk)
input
queue.type
queue.size
...
queue.workerThreads
(filter, parse and
send events)
queue.dequeueBatchSize
rsyslog_solr.py
rsyslog_solr.py
rsyslog_solr.py
action
template {JSON}
22. 22
01
Flow in rsyslog
/var/log/apache.log
syslog socket
main queue (RAM+Disk)
input
queue.type
queue.size
...
queue.workerThreads
(filter, parse and
send events)
queue.dequeueBatchSize
rsyslog_solr.py
rsyslog_solr.py
rsyslog_solr.py
action
template {JSON}
23. 23
01
Simple Config (1/2) https://github.com/sematext/lucene-revolution-samples/tree/master/2015
module(load="imfile")
module(load="omprog")
input(type="imfile"
File="/opt/logs/example.log"
Tag="apache:")
main_queue(
queue.highWatermark="100000"
queue.lowWatermark="50000"
queue.maxDiskSpace="5g"
queue.fileName="solr_action"
queue.spoolDirectory="/opt/rsyslog/queues"
queue.saveOnShutdown="on"
queue.workerThreads="4"
queue.dequeueBatchSize="500"
)
apache combined logs
24. 24
01
Simple Config (2/2)
template(name="json_lines" type="list" option.json="on") {
constant(value="{")
constant(value=""timestamp":"")
property(name="timereported" dateFormat="rfc3339")
constant(value="","message":"")
property(name="msg")
...
constant(value="","syslog-tag":"")
property(name="syslogtag")
constant(value=""}n")
}
action(
type="omprog"
binary="/opt/rsyslog/rsyslog_solr.py"
template="json_lines"
)
get from https://github.com/rsyslog/rsyslog/tree/master/plugins/external/solr
28. 28
01
JSON Config
# same main queue settings and modules
input(type="imfile"
File="/opt/logs/example.log.parsed"
Tag="apache:")
module(load="mmnormalize")
action(type="mmnormalize"
rulebase="/opt/rsyslog/json.rb"
)
template(name="json_lines" type="list") {
property(name="$!root") constant(value="n")
}
action(type="omprog"
...
apache combined logs
already parsed in JSON
version=2
rule=:%root:json%
30. 30
01
Normalizing Config
input(type="imfile"
File="/opt/logs/example.log"
Tag="apache")
action(type="mmnormalize"
rulebase="/opt/rsyslog/apache_combined.rb"
)
template(name="json_lines" type="list") {
property(name="$!all-json")
constant(value="n")
}
version=2
rule=:%[
{"type": "word", "name": "clientip"},
{"type": "literal", "text": " "},
...
{"type": "char-to", "name": "agent", "extradata": """},
{"type": "literal", "text": """},
{"type": "rest", "name": "blob"}
]%
32. 32
01
Normalizing “Should Scale”*
sys
tem log
d -ng
performance depends mostly on log length and not on the number of rules:
http://blog.gerhards.net/2013/01/performance-of-liblognormrsyslog-parse.html
33. rule=apache_combined:%[
{"type": "word", "name": "clientip"},
...
{"type": "char-to", "name": "agent", "extradata": """},
{"type": "literal", "text": """},
{"type": "rest", "name": "blob"}
]%
rule=apache_common:%[
{"type": "word", "name": "clientip"},
...
{"type": "number", "name": "bytes"},
{"type": "rest", "name": "blob", "priority": 65535}
]%
...
33
01
Normalizing with Five Rules
input(type="imfile"
File="/opt/logs/example*"
Tag="apache")
action(type="mmnormalize"
rulebase="/opt/rsyslog/multiple_rules.rb"
)
if $!root <> "" then {
set $.final-json = $!root;
} else {
set $.final-json = $!all-json;
}
template(name="json_lines" type="list") {
property(name="$.final-json") constant(value="n")
}
43. 43
01
Schema: Two Kinds of Fields
message:failed
"docValues": true
"omitNorms": true,
"omitTermFreqAndPositions": true
44. 44
01
Schema: Two Kinds of Fields
message:failed
"docValues": true
"omitNorms": true,
"omitTermFreqAndPositions": true
+20 to 100% capacity* 10% faster indexing*
* http://blog.sematext.com/2014/11/17/solr-presentations-lucene-solr-revolution/
48. 48
01
Time-Based Collections
indexing, merges,
most searches
doesn’t change => cache friendly
=> can be optimized
delete without
triggering merges
20-30x capacity; less indexing degradation*
* http://www.slideshare.net/sematext/side-by-side-with-elasticsearch-solr-part-2