Collecting logs from the entire stateless environment is challenging parts of the application lifecycle. Correlating business logs with operating system metrics to provide insights is a crucial part of the entire organization. What aspects should be considered while you design your logging solutions?
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Docker Logging and analysing with Elastic Stack
1. Docker Logging
and analysing with
Elastic Stack
JAKUB HAJEK,
jakub.hajek@cometari.com, @_jakubhajek November 2019, Warsaw
www.devopsdays.pl
2. Introduction
• I am the owner and technical consultant working for Cometari
• I have been system admin since 1998.
• Cometari is a solution company implementing DevOps culture, providing
consultancy, workshops and software services.
• Our areas of expertise are DevOps, Elastic Stack (log analysis), Cloud
Computing.
• We are very deeply involved in the travel tech industry, however our solutions go
much further than just integrating travel API’s.
3. —
“I strongly believe that implementing DevOps
culture, across the entire organisation, should
provide measurable value and solve the real
issue rather than generate a new one.”
4. Agenda
• A little bit of the theory about logs.
• The major difference with old fashioned approach comparing to container world.
• Distributed logging with Elasticsearch and Fluentd
• Demo of logging based on live demos:
• A simple example sending logs from container to Fluentd
• Fully fledged environment running on Docker Swarm with deployed:
Elasticsearch Cluster, Kibana and Fluentd
• Deployed application stack contains multi tier application stack including Traefik
frontend and backend application
7. What are logs?
• Logs are the stream of aggregated, time ordered events collected from the
output stream
• The output stream can be generated by processes and backing services
• Raw logs are typically a text format with one event per line
• Backtraces from exceptions are usually multiline
• Logs have no beginning or end but flows continuously as long as the app is
operating.
8. Logging considerations
• Logging is not cheap. Requires lots of computing: storage, cpu, memory.
• Logging can be even expensive if you want to search against logs and correlate data.
• Having “LIVE” data accessible immediately can be even more expensive.
• Don’t log everything, consider which data you are interested in (it’s not for free)
• Logging retention time have to be considered (Curator if you store logs in Elasticsearch)
• I recommend Elasticsearch to keep logs as a time based data. It requires some
experience with Elasticsearch to provide reliable environment for logs.
• Logging is a mess ; Logging is not fun but we have to deal with it and build logging
solution
9. Logging in production
• Service logs
• Web access logs
• Transaction Logs
• Distributed tracing
• System Logs
• Syslog, system and other logs
• Audit logs
• Basic operating system metrics (CPU, memory, load …)
Logs for Business
KPI
Machine Learning
Predctive analytics
…
Logs for Service
System monitoring
Bottleneck
Troubleshooting
…
10. Logging is not the same as Monitoring
• Logging is recording to diagnose a system
• Monitoring is an observation, checking and than recording
• A Notification ( usually called alerts) can be send out to any notification
channels for both: logging and monitoring
• The notification can be triggered when specific criteria is met. e.g.
Http_requests_response_code is 500 in the last 60 seconds
A plugin had an unrecoverable error. Will restart this plugin.
Pipeline_id:main_dlq
Plugin: <LogStash::Inputs::DeadLetterQueue pipeline_id=>"main", path=>"/usr/share/logstash/data/dead_letter_queue", id=>"830027210528f50ad1234fe96f0ccc5f8a6989bb0b2d944881373ec56e555357", commit_offsets=>true,
enable_metric=>true, codec=><LogStash::Codecs::Plain id=>"plain_32044710-aeb5-4303-ba0e-2feb2dd851e9", enable_metric=>true, charset=>"UTF-8">>
Error:
Exception: Java::JavaNio::BufferOverflowException
Stack: java.nio.HeapByteBuffer.put(java/nio/HeapByteBuffer.java:189)
eas_errors{errorType=“CONTENT”,provider=“HRS",requestName="HotelAvailability",
errorId=“1234",errorSeverity="2",startDate="2019-11-20T22:00:00",endDate="2019-11-21T21:59:59",} 10.0
14. The container world
Bare metal Container world
Service architecture Monolithic Microservices
System image Mutable Immutable
Local data Persistent Ephemeral
Network Physical Address No fixed address
Environment Manually / Automation Orchestration tools
Logging syslogd/rsync ?
*There is nothing wrong with monolithic system
unless you can distinguish boundaries in the system and
move that domain to the service on demand !
15. What are the challenges with logs in
container world?
16. Logging challenges with Containers
• No permanent storage (Container are stateless and storage is Ephemeral
• No fixed physical address.
• No fixed mapping between server and roles
• Lots of various application types
• Transfer logs immediately to distributed logging infrastructure
• Push logs from containers
• Labels logs with service name or use tags
• Need to handle various logs with regexp, GROK
18. Logging and Docker container strategy
• Application should writes a message to the STDOUT
STDOUT
APPLICATION running in
Docker container
Hello World!
19. Logging and Docker container strategy
• Message encapsulated in a JSON map (with JSON driver) structure via Docker.
Hello World!
{
“log” : “hello World!”,
“stream”: “stdout”,
“time”: “timestamp"
}
25. Treat logs as an event stream
• Application should be stateless and does not store data / logs locally.
• Logs should not attempt to write to local storage
• Logs should not be managed locally, e.g. logrotate
• All logs should be treated as an event streams
• Each running process writes its event to STDOUT and STDERR
• In container based environment logging should be sent to STDOUT
28. Log collectors for Central logging
• Logstash from Elastic Stack, Fluentd, Apache Flume and many more…
LOGS LOG COLLECTOR STORAGE
• Example storage options:
• S3, MongoDB, Hadoop, Elasticsearch
• file, forward, copy, stdout (useful for debugging)
29. Fluentd data collector
• An extensible and reliable data collection.
• Unified Logging Layer - treats logs as JSON
• Pluggable Architecture
• Supports memory and file based buffering to prevent internode data lost
• Built-in HA and load balancing
30. CORE
• Divide and conquer
• Buffering and retries
• Error Handling
• Message routing
• Parallelism
PLUGINS
• Read data
• Parse data
• Buffer data
• Write data
• Format data
31. Unifying logging layer
Services Services
Collector nodes
Aggregator Nodes
Elasticsearch
Fluentd
Application generates logs
Convert raw log data
in a structured data
Aggregated structured data
Structured Data Ready for analysis
32. An event in Fluentd
TAG: myapp.access
TIME: (current time)
RECORD: {“event”: “data”}
34. TAG TIME
RECORD
ROUTER
input - filter Output
Chunk
Chunk
Chunk
Metadata
Metadata
Metadata
BUFFER
Chunk
QUEUE
Chunk
ChunkChunk
Chunk
Process
Format
Write
Try _write OUTPUT
EMIT
ENQUEUE
source: https://docs.fluentd.org/output
35. Brief overview of configuration
• <source> where all the data come from, routing engine
• <match> Tell Fluentd what to do!
• <filter> Event processing pipeline
• INPUT -> filter 1 -> …. -> filter N -> OUTPUT
• <system> - system directive
• <label> use for grouping filter and output for internal routing
• @include split config into multiple files and re-use configuration
Source: https://docs.fluentd.org/configuration/config-file
37. Docker fluentd driver
• The logging driver sends container logs to Fluentd in as structured log data
• Metadata: container_id, container_name, source, logs
• —log-driver fluentd —log-opt tag=docker.{{.ID} —log-opt fluentd-
address=tcp://fluenthost
• Messages are buffered until connection is established.
• The data can be buffered before flushing
• Retry, max-retry, sub-second-precision…