Más contenido relacionado

Similar a Scaling an ELK stack at


Scaling an ELK stack at

  1. Scaling an ELK stack Elasticsearch NL meetup 2014.09.22, Utrecht
  2. 1 Who am I? Renzo Tomà • IT operations • Linux engineer • Python developer • Likes huge streams of raw data • Designed metrics & logsearch platform • Married, proud father of two And you?
  3. 2 ELK
  4. 3 ELK at Logsearch platform. For developers & operations. Search & analyze log events using Kibana. Events from many sources (e.g. syslog, accesslog, log4j, …) Part of our infrastructure. Why? Faster root cause analyses  quicker time-to-repair.
  5. 4 Real world examples Case: release of new webshop version. Nagios alert: jboss processing time. Metrics: increase in active threads (and proctime). => Inconclusive! Find all HTTP requests to which were slower than 5 seconds: @type:apache_access AND”www_bol_com” AND @fields.responsetimes:[5.000.000 TO *] => Hits for 1 URL. Enough for DEV to start its RCA.
  6. 5 Real world examples Case: strange performance spikes on webshop. Looks bad, but cause unknown. Find all errors in webshop log4j logging: @fields.application:wsp AND @fields.level:ERROR Compare errors before vs during spike. Spot the difference. => Spikes caused by timeouts on a backend service. Metrics correlation: timeouts not cause, but symptom of full GC issue.
  7. Initial design (mid 2013’ish) 6 Kibana2 Servers, routers, firewalls … Remote _syslog pkg Log4j syslog appender Logstash Elastic Elassetaicrc h search Syslog Log events Acts as syslog server. Converts lines into events, into json docs. Accesslog Central syslog server Apache webservers Java webapplications (JVM) Using syslog protocol over UDP as transport. Even for accesslog + log4j. tail
  8. 7 Initial attempt #fail Single logstash instance not fast enough. Unable to keep up with events created. High CPU load, due to intensive grokking (regex). Network buffer overflow. UDP traffic dropped. Result: missing events.
  9. 8 Initial attempt #fail Log4j events can be multiline (e.g. stacktraces). Events are send per line: 100 lines = 100 syslog msgs Merging by Logstash. Remember the UDP drops? Result: - unparseable events (if 1st line was missing) - Swiss cheese. Stacktrace lines were missing.
  10. 9 Initial attempt #fail Syslog RFC3164: “The total length of the packet MUST be 1024 bytes or less.” Rich Apache LogFormat + lots of cookies = 4kb easily. Anything after byte 1024 got trimmed. Result: unparseable events (mismatch grok pattern)
  11. 10 The only way is up. Improvement proposals: - Use queuing to make Logstash horizontal scalable. - Drop syslog as transport (for non-syslog). - Reduce amount of grokking. Pre-formatting at source scales better. Less complexity.
  12. Latest design (mid 2014’ish) Lots of Many instances other sources 11 Kibana 2 + 3 Servers, routers, firewalls … Local Logsheep Log4j jsonevent layout Elastic Elassetaicrc h search Syslog Accesslog jsonevent format Log events Central syslog server Apache webservers Java webapplications (JVM) Elastic Resdeaisrch (queue) Log4j redis appender Logstash Local Logsheep Events in jsonevent format. No grokking required.
  13. 12 Current status #win - Logstash: up to 10 instances per env (because of logstash 1.1 version) - ES cluster (v1.0.1): 6 data + 2 client nodes - Each datanode has 7 datadisks (striping) - Indexing at 2k – 4k docs added per second - Avg. index time: 0.5ms - Peak: 300M docs = 185GB, per day - Searches: just a few per hour - Shardcount: 3 per idx, 1 replica, 3000 total - Retention: up to 60 days
  14. 13 Our lessons learned Before anything else! Start collecting metrics so you get a baseline. No blind tuning. Validate every change fact-based. Our weapons of choice: • Graphite • Diamond (I am contributor of the ES collector) • Jcollectd Alternative: try Marvel.
  15. 14 Logstash tip #1 Insert Redis as queue between source and logstash instances: - Scale Logstash scale horizontally - High availability (no events get lost) Redis Logstash Logstash Logstash Redis
  16. 15 Logstash tip #2 Tune your workers. Find your chokepoint and increase its workers to improve throughput. Input Filter Output Filter Input Output Filter $ top –H –p $(pgrep logstash)
  17. 16 Logstash tip #3 Grok is very powerful, but CPU intensive. Hard to write, maintain and debug. Fix: vertical scaling. Increase filterworkers or add more Logstash instances. Better: feed Logstash with jsonevent input. Solutions: • Log4j: use log4j-jsonevent-layout • Apache: define json output with LogFormat
  18. 17 Logstash tip #4 (last one) Use the HTTP protocol Elasticsearch output. Avoid a version lock in! HTTP may be slower, but newer ES means: - Lots of new features - Lots of bug fixes - Lots of performance improvements Most important: you decide what versions to use. Logstash v1.4.2 (June ‘14) requires ES v1.1.1 (April ‘14). Latest ES version is v1.3.2 (Aug ‘14).
  19. 18 Elasticsearch tip #1 Do not download a ‘great’ configuration. Elasticsearch is very complex. Lots of moving parts. Lots of different use-cases. Lots of configuration options. The defaults can not be optimal. Start with defaults: • Load it (stresstest or pre-launch traffic). • Check your metrics. • Find your chokepoint. • Change setting. • Verify and repeat.
  20. 19 Elasticsearch tip #2 Increase the ‘index.refresh_interval’ setting. Refresh: make newly added docs available for search. Default value: one second. High impact on heavy indexing systems (like ours). Change it at runtime & check the metrics: $ curl -s -XPUT 0:9200/_all/_settings?index.refresh_interval=5s
  21. 20 Elasticsearch tip #3 Use Curator to keep total shardcount constant. Uncontrolled shard growth may trigger a sudden hockey stick effect. Our setup: - 6 datanodes - 6 shards per index - 3 primary, 3 replica “One shard per datanode” (YMMV)
  22. 21 Elasticsearch tip #4 Become experienced in rolling cluster restarts: - to roll out new Elasticsearch releases - to apply a config setting (e.g. heap, gc, ..) - because it will solve an incident. Control concurrency + bandwidth: cluster.routing.allocation.node_concurrent_recoveries cluster.routing.allocation.cluster_concurrent_rebalance indices.recovery.max_bytes_per_sec Get confident enough to trust doing a rolling restart on a Saturday evening! (To get this graph )
  23. 22 Elasticsearch tip #5 (last one) Cluster restarts improve recovery time. Recovery: compares replica vs primary shard. If different, recreate the replica. Costly (iowait) and very time consuming. But … difference is normal. Primary and replica have their own segment merge management: same docs, but different bytes. After recovery: replica is exact copy of primary. Note: only works for stale shards (no more updates). You have a lot of those when using daily Logstash indices.
  24. You can contact me via:, or
  25. 24
  26. Relocation in action
  27. 26 Tools we use Key/value memory store, no-frills queuing, extremely fast. Used to scale logstash horizontally. Send log4j event to Redis queue, non-blocking, batch, failover Format log4j events in logstash event layout. Why have logstash do lots of grokking, if you can feed it with logstash friendly json. Format Apache access logging in logstash event layout. Again: avoid grokking. (SOON) Logsheep: custom multi-threaded logtailer / udp listener, sends events to redis. Great metrics collector framework with Elasticsearch collector. I am contributor. Tool for automatic Elasticsearch index management (delete, close, optimize, bloom).

Hinweis der Redaktion

  1. Log4j , multiline why? Sent per line Logstash needs to merge (multiline filter) Lots of messages + UDP drops = unparseable + swiss cheese