Technology behind-real-time-log-analytics

Data Science Thailand
28. May 2016

Más contenido relacionado

Presentaciones para ti(20)




Technology behind-real-time-log-analytics

  1. Technology behind Real Time Log Analytics ELK- Elasticsearch, Logstash and Kibana By Supaket Wongkampoo @ Predictive Analytics and Data Science Conference 28 May 2016
  2. SUPAKET WONGKAMPOO Software Engineer @ Agoda *DevOps in passion* - Full Stack Developer - Virtualisation and Infrastruction as code (Puppet/Ansible) - Release Management and continuous development - Real time Log Analytics
  3. State of the Art, Logging Terminology in Large Scale Data processing
  4. Common use cases •*Issue debugging •*Performance analysis •Security analysis •*Predictive analysis •Internet of things (IoT) and logging
  5. Challenges in log analysis •*Non-consistent log format •*Decentralized logs •Expert knowledge requirement
  6. Non-consistent log format TOMCAT LOGS A typical tomcat server startup log entry will look like this: May 24, 2015 3:56:26 PM org.apache.catalina.startup.HostConfig deployWAR INFO: Deployment of web application archive softapache-tomcat-7.0.62webappssample.war has finished in 253 ms APACHE ACCESS LOGS – COMBINED LOG FORMAT A typical Apache access log entry will look like this: - - [24/May/2015:15:54:59 +0530] "GET /favicon.ico HTTP/1.1" 200 21630 IIS LOGS A typical IIS log entry will look like this: 2012-05-02 17:42:15 - 80 GET /images/favicon.ico - 200 Mozilla/ 4.0+(compatible;MSIE+5.5;+Windows+2000+Server)
  7. DECENTRALIZED LOGS For one or two servers' setup, finding out some information from logs involves running cat or tail commands or piping these results to grep command.
  8. Elasticsearch
  9. Elasticsearch - Key feature •• Schema-free, REST & JSON based document store •• Distributed and horizontally scalable •• Open Source: Apache License 2.0 •• Zero configuration •• Written in Java, extensible
  10. Elasticsearch - Term • Index - Logical collection of data; might be time based Analogous to a database • Replications - Read scalability, Removing SPOF • Sharding - Split logical data over several machines Write scalability, Control data flows
  11. Elasticsearch - Distributed and scalable
  12. Elasticsearch - Distributed and scalable
  13. Elasticsearch - use cases • Product search engine, Products grouped, Allowing to filter • Scoring ✴ Possible influential factors, Age of the product, been ordered in last 24h In Stock?, No shipping costs, Special offer, Rating • Analytics ✴ Aggregation, multidimensional (Average revenue per category id per day)
  14. Logstash
  15. Logstash • Managing events and logs • Collect, parse, enrich, store data • Modular: many, many inputs and outputs • Open Source: Apache License 2.0 • Ruby app • Part of Elasticsearch family
  16. Why collect & centralize logs? •Access log files without system access •Shell scripting: Too limited or slow •Using unique ids for errors, aggregate it across your stack •Reporting (everyone can create his/her own report) •Bonus points: Unify your data to make it easily •Searchable
  17. Logstash-Architecture ? ? outputFilterInput
  18. Logstash-Inputs • Monitoring: collectd, graphite, ganglia, snmptrap, zenoss • Datastores: elasticsearch, redis, sqlite, s3 • Queues: rabbitmq, zeromq, kafka • Logging: eventlog, lumberjack, gelf, log4j, relp, syslog, varnish log
  19. Logstash-Filters •alter, anonymize, checksum, csv, drop, multiline •dns, date, extractnumbers, geoip, i18n, kv, noop, ruby, range •json, urldecode, useragent
  20. Logstash-Outputs • Store: elasticsearch, gemfire, mongodb, redis, riak, rabbitmq • Monitoring: ganglia, graphite, graphtastic, nagios, opentsdb, statsd, zabb • Notification: email, hipchat, irc, pagerduty, sns • Protocol: http, lumberjack, metriccatcher, stomp,
  21. Kibana •Flexible analytics and data visualization platform
  22. Kibana
  23. Combine - ELK
  24. Hands on - ELK Web Web Web Web Web Web KafKa
  25. Q&A