9. Prometheus
- started by Matt Proud and Julius Volz as an Open Source project
- first commit 24-11-2012
- public announcement in January 2015
- inspired by Borgmon
- not Borgmon
10. Features – multi-dimensional data model
http_requests_total{instance=”web-1”, path=”/index”, status=”401”, method=”GET”}
#metrics x #labels x #values ▶ millions of time series
11. Features – powerful query language
topk(3, sum by(path, method) (
rate(http_requests_total{status=~”5..”}[5m])
))
histogram_quantile(0.99, sum by(le, path) (
rate(http_requests_duration_seconds_bucket[5m])
))
12. Features – powerful query language
topk(3, sum by(path, method) (
rate(http_requests_total{status=~”5..”}[5m])
))
{path=”/api/comments”, method=”POST”} 105.4
{path=”/api/user/:id”, method=”GET”} 34.122
{path=”/api/comment/:id/edit”, method=”POST”} 29.31
13. Features – easy to use, yet scalable
- single static binary, no dependencies
$ go get github.com/prometheus/prometheus/cmd/...
$ prometheus
- local storage
- high-throughput [millions of time series, 380,000 samples/sec]
- efficient compression
29. AWS EC2
scrape_configs:
- job_name: "foo-api"
metrics_path: "/metrics"
ec2_sd_configs:
- region: us-east-1
refresh_interval: 60s
port: 80
The following meta labels are available during relabeling:
- __meta_ec2_instance_id: the EC2 instance ID
- __meta_ec2_public_ip: the public IP address of the instance
- __meta_ec2_private_ip: the private IP address of the instance, if present
- __meta_ec2_tag_<tagkey>: each tag value of the instance
32. Alerting
- no opinions
- directly defined on time series data
- verbose on firing ▶ compact but detailed on notifcation
33. Alerting
ALERT HighErrorRate
IF sum by(job, path)(rate(http_requests_total{status=~”5..”}[5m])) /
sum by(job, path)(rate(http_requests_total[5m])) * 100 > 1
FOR 10m
SUMMARY “high number of 5xx errors”
DESCRIPTION “{{$labels.job}} has {{$value}}% 5xx errors on {{ $labels.path }}”