Prometheus is an open source monitoring project used to gather metrics.
It as many capabilities built-in, such as service discovery, which makes it very suitable for an automated environment.
This talk will give a brief introduction of Prometheus, what are the latest developments, and then give practical tips and examples about how you can use it in an automated world.
3. • Applications are short lived
• Updated often
• Infrastructure changes
(Nothing new...)
A fast changing world
@roidelapluie
4. • Monitoring an infrastructure
• Monitoring user experience
... together (dev&ops)
Monitoring in a "fast changing world"
@roidelapluie
5. CPU Usage, Disk space, Memory, Open file descriptors, ...
Infrastructure monitoring
@roidelapluie
6. Request Rate, Request Errors, Request Duration
Utilization, Saturation, Errors
User experience monitoring
@roidelapluie RED method by Tom Wilkie, USE method by Brendan Gregg
7. • High level overview of the state of a service/component
• Performance
• Availability
• Technical components
What is going on?
What is monitoring?
@roidelapluie
8. • Understand how your services behave
• Like you are at their place
• Without specific code
Why is this going on?
What's observability?
@roidelapluie
9. • Monitoring is required
• Some monitoring systems are design for observability
• If lucky, monitoring is enough
• Observability is removing luck
How do monitoring and observability connect?
@roidelapluie
17. • Open Source monitoring solution
• Graduated CNCF Project
• Born in 2012, publicly announced in 2015
• Collects metrics
• Plenty of integrations
• Service discoveries, like kubernetes.
• Easy to use query language
• Built-in alerting
Prometheus
@roidelapluie
18. • A community
• A server and many other components
• An ecosystem
What "Prometheus" means
@roidelapluie
19. • Open Source
• Pull-based Monitoring over HTTP
• Powerful query language
• Optimized TSDB
The Prometheus server
@roidelapluie
26. Prometheus scrapes metrics over HTTP.
caddy_http_requests_total{code="200",method="POST",path="/load"} 1
Dimensional data model, for filtering and aggregation.
Metrics and Labels
@roidelapluie
29. • Metrics do not represent problems
• Metrics represent a state, give insights
• Metrics can be graphed
• You can alert based on them
Metrics and monitoring
@roidelapluie
30. In general you can just expose counters, and let the monitoring server do the
real maths.
That keeps the overhead very low of apps.
Exposed metrics are "raw"
@roidelapluie
36. Let's see what makes Prometheus play nicely with automation tools.
Automating Prometheus
@roidelapluie
37. • Works on your machine
• Container ready
• Not tied to kubernetes (see prometheus-operator)
Deploy anywhere
@roidelapluie
38. • Reloads on SIGHUP
• /-/reload endpoint (--web.enable-lifecycle)
• Working to have less and less overhead on reloads
Reloading Prometheus
@roidelapluie
40. Plenty of situation do not require a reload of Prometheus:
• Password files
• TLS certificates
Prometheus will read them before use, no reload needed!
Not reloading Prometheus
@roidelapluie
41. HashiCorp Vault enables retrieving temporary secrets and writing them to a file.
./vault agent -config vault-agent.hcl
Using vault
@roidelapluie https://inuits.eu/blog/prometheus-consul-vault-228/
44. The "web-config" file is read on every request:
• No need to reload
• Instantly change passwords, cert files
Shared config format between Prometheus and exporters!
TLS and basic auth
@roidelapluie
45. Prometheus has a snapshot API.
Enable with --web.enable-admin-api
curl -d{} http://localhost:9090/api/v1/admin/tsdb/snapshot
Prometheus TSDB is made of immutable blocks. Snapshots use hard links.
Backups
@roidelapluie
47. • Easier to know what's down with Pull
• Easy debugging (curl)
• Easier to spread the load
• Central configuration point
• High Availability
Prometheus pull model
@roidelapluie
48. • Prometheus must know what to pull
• Source of Truth
• Service Discovery != Auto discovery
• Event based when possible
Service Discovery
@roidelapluie
49. • Kubernetes
• Consul
• Cloud providers (Azure, AWS, GCP, DigitalOcean, Hetzner, Scaleway, Linode)
• Docker & Docker Swarm
• And more! 20+ external SD in total.
Sources of Truth
@roidelapluie
50. • Static SD: into Prometheus main config
• File SD: Files on disk
• HTTP SD: HTTP endpoints
Generic Service Discovery
@roidelapluie https://inuits.eu/blog/prometheus-http-service-discovery/
52. • Both integrate your own SD systems into prometheus
• File SD is event based (inotify)
• HTTP SD can be integrated in your apps
File SD vs HTTP SD
@roidelapluie
53. Labels can be used to configure targets.
• __address__: 127.0.0.1:9090
• __metrics_path__: /metrics
• __scheme__: http or https
• __param_<name>: http parameter
• __scrape_interval__, __scrape_timeout__: 1m
Labels
@roidelapluie
54. Additionally, extra labels are added by SD.
• __meta_kubernetes_pod_label_app
• __meta_digitalocean_region
• __meta_linode_public_ipv6
• __meta_scaleway_instance_status
Meta labels
@roidelapluie https://prometheus.io/docs/prometheus/latest/configuration/configuration/
55. A fundamental principle in Prometheus.
Transform input labels into a new set of labels.
Relabeling
@roidelapluie
56. • Rename, merge, replace labels
• Conditionally drop label sets
• Only keep labels sets
Relabeling actions
@roidelapluie
57. • Get lots of labels as input
• Turns them into targets
• Remove labels prefixed with __
• Can use "special labels"
Target relabeling
@roidelapluie