Collecting logs from the entire stateless environment is challenging parts of the application lifecycle. Correlating business logs with operating system metrics to provide insights is a crucial part of the entire organization. What aspects should be considered while you design your logging solutions?
What's New in Teams Calling, Meetings and Devices March 2024
Docker Logging and analysing with Elastic Stack
1. Docker Logging
and analysing with
Elastic Stack
JAKUB HAJEK,
jakub.hajek@cometari.com, @_jakubhajek November 2019, Warsaw
www.devopsdays.pl
2. Introduction
• I am the owner and technical consultant working for Cometari
• I have been system admin since 1998.
• Cometari is a solution company implementing DevOps culture, providing
consultancy, workshops and software services.
• Our areas of expertise are DevOps, Elastic Stack (log analysis), Cloud
Computing.
• We are very deeply involved in the travel tech industry, however our solutions go
much further than just integrating travel API’s.
3. —
“I strongly believe that implementing DevOps
culture, across the entire organisation, should
provide measurable value and solve the real
issue rather than generate a new one.”
4. Agenda
• A little bit of the theory about logs.
• The major difference with old fashioned approach comparing to container world.
• Distributed logging with Elasticsearch and Fluentd
• Demo of logging based on live demos:
• A simple example sending logs from container to Fluentd
• Fully fledged environment running on Docker Swarm with deployed:
Elasticsearch Cluster, Kibana and Fluentd
• Deployed application stack contains multi tier application stack including Traefik
frontend and backend application
7. What are logs?
• Logs are the stream of aggregated, time ordered events collected from the
output stream
• The output stream can be generated by processes and backing services
• Raw logs are typically a text format with one event per line
• Backtraces from exceptions are usually multiline
• Logs have no beginning or end but flows continuously as long as the app is
operating.
8. Logging considerations
• Logging is not cheap. Requires lots of computing: storage, cpu, memory.
• Logging can be even expensive if you want to search against logs and correlate data.
• Having “LIVE” data accessible immediately can be even more expensive.
• Don’t log everything, consider which data you are interested in (it’s not for free)
• Logging retention time have to be considered (Curator if you store logs in Elasticsearch)
• I recommend Elasticsearch to keep logs as a time based data. It requires some
experience with Elasticsearch to provide reliable environment for logs.
• Logging is a mess ; Logging is not fun but we have to deal with it and build logging
solution
9. Logging in production
• Service logs
• Web access logs
• Transaction Logs
• Distributed tracing
• System Logs
• Syslog, system and other logs
• Audit logs
• Basic operating system metrics (CPU, memory, load …)
Logs for Business
KPI
Machine Learning
Predctive analytics
…
Logs for Service
System monitoring
Bottleneck
Troubleshooting
…
10. Logging is not the same as Monitoring
• Logging is recording to diagnose a system
• Monitoring is an observation, checking and than recording
• A Notification ( usually called alerts) can be send out to any notification
channels for both: logging and monitoring
• The notification can be triggered when specific criteria is met. e.g.
Http_requests_response_code is 500 in the last 60 seconds
A plugin had an unrecoverable error. Will restart this plugin.
Pipeline_id:main_dlq
Plugin: <LogStash::Inputs::DeadLetterQueue pipeline_id=>"main", path=>"/usr/share/logstash/data/dead_letter_queue", id=>"830027210528f50ad1234fe96f0ccc5f8a6989bb0b2d944881373ec56e555357", commit_offsets=>true,
enable_metric=>true, codec=><LogStash::Codecs::Plain id=>"plain_32044710-aeb5-4303-ba0e-2feb2dd851e9", enable_metric=>true, charset=>"UTF-8">>
Error:
Exception: Java::JavaNio::BufferOverflowException
Stack: java.nio.HeapByteBuffer.put(java/nio/HeapByteBuffer.java:189)
eas_errors{errorType=“CONTENT”,provider=“HRS",requestName="HotelAvailability",
errorId=“1234",errorSeverity="2",startDate="2019-11-20T22:00:00",endDate="2019-11-21T21:59:59",} 10.0
14. The container world
Bare metal Container world
Service architecture Monolithic Microservices
System image Mutable Immutable
Local data Persistent Ephemeral
Network Physical Address No fixed address
Environment Manually / Automation Orchestration tools
Logging syslogd/rsync ?
*There is nothing wrong with monolithic system
unless you can distinguish boundaries in the system and
move that domain to the service on demand !
15. What are the challenges with logs in
container world?
16. Logging challenges with Containers
• No permanent storage (Container are stateless and storage is Ephemeral
• No fixed physical address.
• No fixed mapping between server and roles
• Lots of various application types
• Transfer logs immediately to distributed logging infrastructure
• Push logs from containers
• Labels logs with service name or use tags
• Need to handle various logs with regexp, GROK
18. Logging and Docker container strategy
• Application should writes a message to the STDOUT
STDOUT
APPLICATION running in
Docker container
Hello World!
19. Logging and Docker container strategy
• Message encapsulated in a JSON map (with JSON driver) structure via Docker.
Hello World!
{
“log” : “hello World!”,
“stream”: “stdout”,
“time”: “timestamp"
}
25. Treat logs as an event stream
• Application should be stateless and does not store data / logs locally.
• Logs should not attempt to write to local storage
• Logs should not be managed locally, e.g. logrotate
• All logs should be treated as an event streams
• Each running process writes its event to STDOUT and STDERR
• In container based environment logging should be sent to STDOUT
28. Log collectors for Central logging
• Logstash from Elastic Stack, Fluentd, Apache Flume and many more…
LOGS LOG COLLECTOR STORAGE
• Example storage options:
• S3, MongoDB, Hadoop, Elasticsearch
• file, forward, copy, stdout (useful for debugging)
29. Fluentd data collector
• An extensible and reliable data collection.
• Unified Logging Layer - treats logs as JSON
• Pluggable Architecture
• Supports memory and file based buffering to prevent internode data lost
• Built-in HA and load balancing
30. CORE
• Divide and conquer
• Buffering and retries
• Error Handling
• Message routing
• Parallelism
PLUGINS
• Read data
• Parse data
• Buffer data
• Write data
• Format data
31. Unifying logging layer
Services Services
Collector nodes
Aggregator Nodes
Elasticsearch
Fluentd
Application generates logs
Convert raw log data
in a structured data
Aggregated structured data
Structured Data Ready for analysis
32. An event in Fluentd
TAG: myapp.access
TIME: (current time)
RECORD: {“event”: “data”}
34. TAG TIME
RECORD
ROUTER
input - filter Output
Chunk
Chunk
Chunk
Metadata
Metadata
Metadata
BUFFER
Chunk
QUEUE
Chunk
ChunkChunk
Chunk
Process
Format
Write
Try _write OUTPUT
EMIT
ENQUEUE
source: https://docs.fluentd.org/output
35. Brief overview of configuration
• <source> where all the data come from, routing engine
• <match> Tell Fluentd what to do!
• <filter> Event processing pipeline
• INPUT -> filter 1 -> …. -> filter N -> OUTPUT
• <system> - system directive
• <label> use for grouping filter and output for internal routing
• @include split config into multiple files and re-use configuration
Source: https://docs.fluentd.org/configuration/config-file
37. Docker fluentd driver
• The logging driver sends container logs to Fluentd in as structured log data
• Metadata: container_id, container_name, source, logs
• —log-driver fluentd —log-opt tag=docker.{{.ID} —log-opt fluentd-
address=tcp://fluenthost
• Messages are buffered until connection is established.
• The data can be buffered before flushing
• Retry, max-retry, sub-second-precision…