This talk is a quick intro to Prometheus with an overview on all its components. The presentation points to a generally available demo so that you can see all its components in action.
4. What is Prometheus?
Community Driven Open-source Monitoring and
Alerting toolkit, ships a time series database, an
alerting entity and a number of integration tools to
expose metrics.
5. A bit of context around Prometheus
Started in 2012 as a SoundCloud
internal project
Second project to join CNCF after
Kubernetes
16. PromQL & Label based queries
http_requests_total all time series related to the metric http_requests_total
http_requests_total{code="200",method="get"} time series related to successful request with
method get for the metric http_requests_total
http_requests_total{code="200",method="get"}[5m] returns a range vector
http_requests_total{status!~"^4..$"}
Selecting all errors-related time series using
regexes
sum(rate(http_requests_total[5m])) by (job) Applying functions, in this case we sum over a
range vector and aggregating by job
17. ACHTUNG!
Labels are transparent to Prometheus
whatever you define when exposing a metric, will be
available to query in the db.
18. Pulling metrics
● How do you expose those metrics?
● Exporters
● Language specific SDKs
19. /metrics
# HELP hash_seconds Time taken to create hashes
# TYPE hash_seconds histogram
hash_seconds_bucket{code="200",le="1"} 2
hash_seconds_bucket{code="200",le="2.5"} 2
hash_seconds_bucket{code="200",le="5"} 2
hash_seconds_bucket{code="200",le="10"} 2
hash_seconds_bucket{code="200",le="+Inf"} 2
hash_seconds_sum{code="200"} 9.370800000000002e-05
hash_seconds_count{code="200"} 2
20. Exporters
The community has contributed
writing exporters for pretty much
everything
https://prometheus.io/docs/instrumenting/exporters/
21. Alerting, Writing alerts: a simple example
ALERT <alert_name>
IF <condition>
FOR 5m
LABELS { severity="error" }
ANNOTATIONS {
summary = <summary>,
description = <description>,
}
22. Alert-Manager, Writing alerts: a real-world example
ALERT NODE_DISK_FREE_SPACE_ROOT_PARTITION_80
IF ((node_filesystem_size{fstype="rootfs"}-node_filesystem_avail{fstype="rootfs"})/node_filesystem_size{fstype="rootfs"})*100 > 80
FOR 5m
LABELS { severity="error" }
ANNOTATIONS {
summary = "Current disk usage on root partition is {{ $value }}% on node {{ $labels.instance }}",
description = "Current disk usage on root partition is {{ $value }}% on node {{ $labels.instance }}",
fixes = "Current disk usage on root partition is {{ $value }}% on node {{ $labels.instance }}",
}
23. Alert Dispatching
Job of the alert-manager is to dispatch
alerts to the right channel according to
their severity