This document discusses observability tools for distributed systems like Drupal websites. It recommends using Monolog for structured logging, the Prometheus monitoring system for metrics collection, and OpenTelemetry with Tempo for distributed tracing. The Observability suite module can integrate all three for Drupal. Monolog logs are scraped by Promtail and sent to Loki. Metrics are exposed via Prometheus and scraped. OpenTelemetry instruments code and sends traces to Tempo. This provides insights across logs, metrics and traces for observability of distributed applications.
4. WE ARE A TECH COMPANY OF ENGINEERS,
DEVELOPERS AND DESIGNERS WHO WILL
THINK, DESIGN AND BUILD YOUR CUSTOM APPLICATIONS,
MODERNIZE YOUR LEGACY AND TAKE YOU TO THE
CLOUD NATIVE ERA.
www.sparkfabrik.com
SPARKFABRIK
5. We help italian businesses to bridge
the gap with China thanks to our
Official Partnership with
Alibaba Cloud.
SparkFabrik is Cloud Native
Computing Foundation
Silver Member.
SparkFabrik is Google Cloud Platform
Technology Partner.
SparkFabrik is AWS
Official Partner.
PROUD OF OUR PARTNERSHIPS
6. Almost everyone is working with distributed systems.
There are microservices, containers, cloud, serverless, headless, message queues
and a lot of combinations of these technologies.
7. All of these, increase the number of failures that systems may encounter
because there are too many parts interacting.
And because of the distributed system’s diversity, it’s
complex to understand present problems and predict
future ones
8. Observability is a measure of how well
internal states of a system can be
inferred from knowledge of its external
outputs
9. We need more data to correlate classic metrics like CPU
and memory spikes with application behaviours
12. Tools
We need different tools and technologies to collect logs, traces
and metrics, and we need a way to query and visualize them.
Logs -> Monolog + Promtail + Loki
Metrics -> Prometheus
Traces -> OpenTelemetry + Tempo
Visualization -> Grafana
13. Tools
Grafana
Grafana allows to query, visualize, alert on
and understand metrics, traces and logs no
matter where they are stored.
grafana.com
16. Tools
Monolog
Monolog is a standard PHP library and it can
be included in a Drupal website using a
contrib module (drupal.org/project/monolog)
Monolog sends logs to files, sockets,
inboxes, databases and various web
services.
Monolog implements the PSR-3 interface
that can be type-hint against in code to keep
a maximum of interoperability.
github.com/Seldaek/monolog
17. Download the Monolog module using
Composer, to have both the Drupal module and the PHP library
composer require drupal/monolog:^2.2
18. Monolog module doesn't have a UI, it's configured using yml files,
for example
sites/default/monolog.services.yml
20. parameters:
monolog.channel_handlers:
default:
handlers:
- name: 'rotating_file'
formatter: 'json'
monolog.processors: [
'message_placeholder', 'current_user',
'request_uri', 'ip', 'referer',
'filter_backtrace', 'introspection'
]
Then define handlers, formatter
and processors using service
container parameters. Here we're
configuring the default channel to
catch all log messages and to save
them using the
monolog.handler.rotating_file
service, in json format and after
being processed by a set of
processors
sites/default/monolog.services.yml
21. $settings['container_yamls'][] =
DRUPAL_ROOT . '/sites/default/monolog.services.yml';
Add monolog.services.yml to
the list of container’s yamls in settings.php file
sites/default/settings.php
24. Structured logs makes it simple to query them for any
sort of useful information
We can write custom Monolog processors to add
application’s custom data to our logs
(for example: pod name, cluster id, order id, …)
25. In a Cloud Native environment, the application runs on multiple servers (or pods). We
need a way to export all those logs generated by every instance of the application.
In this case our logs are files stored in the local filesystem of every instance.
We have to discover, scrape and send them to a log collector.
Promtail is an agent which ships the contents of local logs to a private Grafana Loki
instance or Grafana Cloud. It is usually deployed to every machine that has applications
needed to be monitored.
29. 1. Logs are about storing specific events
2. Metrics are a measurement at a point in time for the system
30. Tools
Prometheus
Prometheus is an open-source systems
monitoring and alerting toolkit originally built
at SoundCloud. It is now a standalone open
source project and it’s maintained
independently of any company.
Prometheus collects and stores metrics as
time series data.
Prometheus was the second project to join
the Cloud Native Computing Foundation after
Kubernetes.
prometheus.io
31. Examples of metrics you might need:
● the number of times you receive an HTTP request
● how much time was spent handling requests
● how many users/nodes has been created (over time)
● the number of modules that need a security update
● The number of orders in an ecommerce site
● And many more …
32. There’s a module for that!
Observability suite
https://www.drupal.org/project/o11y
composer require drupal/o11y:1.x-dev
drush pm:enable o11y_metrics o11y_metrics_requests
34. Prometheus scrapes data at the /metrics endpoint at a configured rate
PHP uses a shared-nothing architecture by default
o11y needs a way to store data between one scrape and the next
default implementation uses Database as a storage backend (but you can use Memcache
or Redis as well)
35. The main module (o11y_metrics) provides the following metrics:
PHP info
Node count: total and with bundle labels
Node revision count: total and with bundle labels
Extensions: list of modules/themes/profiles installed with name and version labels
Queue size: list of queues with number of items in them
User count: total, with status (active/blocked) and role labels
A set of submodules can be installed to provide additional metrics:
o11y_metrics_cache: cache total hits, miss with bin labels. Tag invalidations, with tag and request path labels
o11y_metrics_config: information whether the drupal config is out of sync or not
o11y_metrics_database: histograms for time spent on select queries, with database target name and route labels
o11y_metrics_requests: histograms for time spent on requests, with http method, route name and http code status labels
o11y_metrics_update: info about existing core/module/theme updates
o11y_metrics_comment: comments count, with status labels
36. The module exposes an URL with metrics in
Prometheus format (/metrics)
# HELP drupal_http_requests Timing metrics for requests.
# TYPE drupal_http_requests histogram
drupal_http_requests_bucket{method="GET",route="user_admin_create",status="2xx",le="0.005"} 0
drupal_http_requests_bucket{method="GET",route="user_admin_create",status="2xx",le="0.01"} 0
drupal_http_requests_bucket{method="GET",route="user_admin_create",status="2xx",le="0.025"} 0
drupal_http_requests_bucket{method="GET",route="user_admin_create",status="2xx",le="0.05"} 0
drupal_http_requests_bucket{method="GET",route="user_admin_create",status="2xx",le="0.075"} 0
drupal_http_requests_bucket{method="GET",route="user_admin_create",status="2xx",le="0.1"} 0
drupal_http_requests_bucket{method="GET",route="user_admin_create",status="2xx",le="0.25"} 0
drupal_http_requests_bucket{method="GET",route="user_admin_create",status="2xx",le="0.5"} 0
drupal_http_requests_bucket{method="GET",route="user_admin_create",status="2xx",le="0.75"} 0
drupal_http_requests_bucket{method="GET",route="user_admin_create",status="2xx",le="1"} 0
drupal_http_requests_bucket{method="GET",route="user_admin_create",status="2xx",le="2.5"} 4
drupal_http_requests_bucket{method="GET",route="user_admin_create",status="2xx",le="5"} 6
drupal_http_requests_bucket{method="GET",route="user_admin_create",status="2xx",le="7.5"} 6
drupal_http_requests_bucket{method="GET",route="user_admin_create",status="2xx",le="10"} 6
drupal_http_requests_bucket{method="GET",route="user_admin_create",status="2xx",le="+Inf"} 6
drupal_http_requests_count{method="GET",route="user_admin_create",status="2xx"} 6
40. 1. Logs are about storing specific events
2. Metrics are a measurement at a point in time for the system
3. Distributed traces deals with information that is request-scoped
41. Tools
OpenTelemetry
OpenTelemetry is a collection of tools, APIs,
and SDKs. It can be used to instrument,
generate, collect, and export telemetry data
(metrics, logs, and traces) to help analyze
software’s performance and behavior.
OpenTelemetry is an incubating project from
the Cloud Native Computing Foundation,
created after the merger of OpenCensus
(from Google) and OpenTracing (from Uber).
The data collected with OpenTelemetry is
vendor-agnostic and can be exported in
many formats.
https://opentelemetry.io
42. Main cloud vendor support for OpenTelemetry
● AWS Distro for OpenTelemetry: aws-otel.github.io/
● Google Cloud OpenTelemetry: google-cloud-opentelemetry.readthedocs.io
● Azure Monitor OpenTelemetry:
docs.microsoft.com/en-us/azure/azure-monitor/app/opentelemetry-overview
44. The OpenTelemetry Collector offers a vendor-agnostic implementation of how to receive, process and export
telemetry data. It removes the need to run, operate, and maintain multiple agents/collectors. It allows your service to
offload data quickly and the collector can take care of additional handling like retries, batching, encryption or even
sensitive data filtering.
45. Tools
Tempo
Grafana Tempo is an open source,
easy-to-use, and high-scale distributed
tracing backend. Tempo is cost-efficient,
requiring only object storage to operate, and
is deeply integrated with Grafana,
Prometheus, and Loki. Tempo can ingest
common open source tracing protocols,
including Jaeger, Zipkin, and
OpenTelemetry.
grafana.com/oss/tempo
46. We will use the Observability suite module to instrument our
application
Internally the module uses OpenTelemetry to do the hard work
drush pm:enable o11y_traces
47. $class_loader->addPsr4('Drupaltracer', [ __DIR__ . '/../../modules/contrib/tracer/src']);
$settings['container_base_class'] = 'DrupaltracerDependencyInjectionTraceableContainer';
$settings['tracer_plugin'] = 'o11y_tracer';
sites/default/settings.php
There should be only one module that instruments the code at a time. We need to replace a lot
of services and subsystems with traceable versions.
To avoid code duplication we create a third project: Tracer
(https://www.drupal.org/project/tracer) which both WebProfiler and Observability suite
depend on.
https://blog.sparkfabrik.com/en/webprofiler-updates
48. Per-process logging and metric monitoring have their
place, but neither can reconstruct the elaborate
journeys that transactions take as they propagate
across a distributed system. Distributed traces are
these journeys
49. We take for example a Drupal 10 website
that renders a page with some data that comes from a
remote microservice
53. Observability suite automatically instrument
● Events
● Twig templates
● HTTP calls
● Database queries
● Services (optional)
but you can trace your own code too!
54. class MicroserviceController extends ControllerBase {
public function view() {
$response =
$this->httpClient->get('http://ddev-drupal10-microservice:8080/endpoint2');
$json = json_decode($response->getBody()->getContents());
$this->someComplexMethod();
return [...];
}
private function someComplexMethod() {
$tracer = Drupal::service('tracer.tracer');
$span = $tracer->start('custom','someComplexMethod',
['someAttribute' => 'someValue']
);
sleep(1);
$tracer->stop($span);
}
}
58. One last thing we need, is to correlate traces with logs,
so when we found a problem with a request we can go
from the trace to the logs (and viceversa)
59. The o11y module provides a new processor for
Monolog that adds a trace_id argument to every log
66. What did you think?
Please fill in this session survey directly from the Mobile App.
67. We appreciate your feedback!
Please take a moment to fill out:
the general
conference survey
Flash the QR code
OR
It will be sent by email
the Individual
session surveys
(located under each session description)
1 2