SlideShare ist ein Scribd-Unternehmen logo
1 von 65
Downloaden Sie, um offline zu lesen
Julien Pivotto
@roidelapluie
Monitoring in a fast-changing world with
Prometheus
October 2021
Monitoring
@roidelapluie
• Applications are short lived
• Updated often
• Infrastructure changes
(Nothing new...)
A fast changing world
@roidelapluie
• Monitoring an infrastructure
• Monitoring user experience
... together (dev&ops)
Monitoring in a "fast changing world"
@roidelapluie
CPU Usage, Disk space, Memory, Open file descriptors, ...
Infrastructure monitoring
@roidelapluie
Request Rate, Request Errors, Request Duration
Utilization, Saturation, Errors
User experience monitoring
@roidelapluie RED method by Tom Wilkie, USE method by Brendan Gregg
• High level overview of the state of a service/component
• Performance
• Availability
• Technical components
What is going on?
What is monitoring?
@roidelapluie
• Understand how your services behave
• Like you are at their place
• Without specific code
Why is this going on?
What's observability?
@roidelapluie
• Monitoring is required
• Some monitoring systems are design for observability
• If lucky, monitoring is enough
• Observability is removing luck
How do monitoring and observability connect?
@roidelapluie
Three pillars:
• Metrics
• Logs
• Traces
What's observability - in Practice?
@roidelapluie
Metrics
@roidelapluie https://play.grafana.org/
Logs
@roidelapluie
Traces
@roidelapluie https://www.jaegertracing.io/img/trace-detail-ss.png
• Culture: building together
• Automation
• Measurement
• Sharing
A DevOps world
@roidelapluie John Willis and Damon Edwards, 2010
Prometheus
@roidelapluie
• Open Source monitoring solution
• Graduated CNCF Project
• Born in 2012, publicly announced in 2015
• Collects metrics
• Plenty of integrations
• Service discoveries, like kubernetes.
• Easy to use query language
• Built-in alerting
Prometheus
@roidelapluie
• A community
• A server and many other components
• An ecosystem
What "Prometheus" means
@roidelapluie
• Open Source
• Pull-based Monitoring over HTTP
• Powerful query language
• Optimized TSDB
The Prometheus server
@roidelapluie
How does it work?
@roidelapluie
How does it work?
@roidelapluie
How does it work?
@roidelapluie
How does it work?
@roidelapluie
How does it work?
@roidelapluie
Metrics
@roidelapluie
Prometheus scrapes metrics over HTTP.
caddy_http_requests_total{code="200",method="POST",path="/load"} 1
Dimensional data model, for filtering and aggregation.
Metrics and Labels
@roidelapluie
sum(rate(caddy_http_requests_total{code=~"5.."}[5m])
Gets the rate of all 5xx HTTP responses (server errors).
Querying metrics and Labels
@roidelapluie
sum(rate(caddy_http_requests_total{code=~"5.."}[5m]) without(code)
/
sum(rate(caddy_http_requests_total[5m]) without(code)
Gets the % of 5xx HTTP responses (server errors).
Querying metrics and Labels
@roidelapluie
• Metrics do not represent problems
• Metrics represent a state, give insights
• Metrics can be graphed
• You can alert based on them
Metrics and monitoring
@roidelapluie
In general you can just expose counters, and let the monitoring server do the
real maths.
That keeps the overhead very low of apps.
Exposed metrics are "raw"
@roidelapluie
Architecture
@roidelapluie
• Prometheus server
• Alertmanager
• Exporters
Prometheus components
@roidelapluie
• Single binary
• No clustering
• No dependency on distributed FS
Prometheus server
@roidelapluie
• Single binary
• Clustering (raft protocol)
Alertmanager
@roidelapluie
Automation
@roidelapluie
Let's see what makes Prometheus play nicely with automation tools.
Automating Prometheus
@roidelapluie
• Works on your machine
• Container ready
• Not tied to kubernetes (see prometheus-operator)
Deploy anywhere
@roidelapluie
• Reloads on SIGHUP
• /-/reload endpoint (--web.enable-lifecycle)
• Working to have less and less overhead on reloads
Reloading Prometheus
@roidelapluie
- template:
source: prometheus.yml.j2
target: /etc/prometheus/prometheus.yml
validate: /usr/bin/promtool check config %s
Also: check rules, check web-config.
Ansible
@roidelapluie
Plenty of situation do not require a reload of Prometheus:
• Password files
• TLS certificates
Prometheus will read them before use, no reload needed!
Not reloading Prometheus
@roidelapluie
HashiCorp Vault enables retrieving temporary secrets and writing them to a file.
./vault agent -config vault-agent.hcl
Using vault
@roidelapluie https://inuits.eu/blog/prometheus-consul-vault-228/
scrape_configs:
- job_name: consul-services
consul_sd_configs:
- server: localhost:8500
authorization:
credentials_file: consul_token
Reading token from vault
@roidelapluie
Prometheus offers native TLS and basic auth.
tls_server_config:
cert_file: server.crt
key_file: server.key
basic_auth_users:
alice: $2y$10$mDwo.lAisC94iLAyP8
bob: $2y$10$hLqFl9jSjoAAy95Z/zw8
TLS and basic auth
@roidelapluie
The "web-config" file is read on every request:
• No need to reload
• Instantly change passwords, cert files
Shared config format between Prometheus and exporters!
TLS and basic auth
@roidelapluie
Prometheus has a snapshot API.
Enable with --web.enable-admin-api
curl -d{} http://localhost:9090/api/v1/admin/tsdb/snapshot
Prometheus TSDB is made of immutable blocks. Snapshots use hard links.
Backups
@roidelapluie
Service Discovery
@roidelapluie
• Easier to know what's down with Pull
• Easy debugging (curl)
• Easier to spread the load
• Central configuration point
• High Availability
Prometheus pull model
@roidelapluie
• Prometheus must know what to pull
• Source of Truth
• Service Discovery != Auto discovery
• Event based when possible
Service Discovery
@roidelapluie
• Kubernetes
• Consul
• Cloud providers (Azure, AWS, GCP, DigitalOcean, Hetzner, Scaleway, Linode)
• Docker & Docker Swarm
• And more! 20+ external SD in total.
Sources of Truth
@roidelapluie
• Static SD: into Prometheus main config
• File SD: Files on disk
• HTTP SD: HTTP endpoints
Generic Service Discovery
@roidelapluie https://inuits.eu/blog/prometheus-http-service-discovery/
[
{
"targets": ["10.0.10.2:9100"],
"labels": {
"__meta_datacenter": "london"
}
}
]
Generic SD format (file & http SD)
@roidelapluie
• Both integrate your own SD systems into prometheus
• File SD is event based (inotify)
• HTTP SD can be integrated in your apps
File SD vs HTTP SD
@roidelapluie
Labels can be used to configure targets.
• __address__: 127.0.0.1:9090
• __metrics_path__: /metrics
• __scheme__: http or https
• __param_<name>: http parameter
• __scrape_interval__, __scrape_timeout__: 1m
Labels
@roidelapluie
Additionally, extra labels are added by SD.
• __meta_kubernetes_pod_label_app
• __meta_digitalocean_region
• __meta_linode_public_ipv6
• __meta_scaleway_instance_status
Meta labels
@roidelapluie https://prometheus.io/docs/prometheus/latest/configuration/configuration/
A fundamental principle in Prometheus.
Transform input labels into a new set of labels.
Relabeling
@roidelapluie
• Rename, merge, replace labels
• Conditionally drop label sets
• Only keep labels sets
Relabeling actions
@roidelapluie
• Get lots of labels as input
• Turns them into targets
• Remove labels prefixed with __
• Can use "special labels"
Target relabeling
@roidelapluie
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- http://prometheus.io
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9115
Target relabeling example
@roidelapluie
https://relabeler.promlabs.com/
Relabeler
@roidelapluie
puppetdb_sd_configs:
- url: http://127.0.0.1:8080/
query: 'resources { type = "Apache::Vhost" }'
include_parameters: true
relabel_configs:
- source_labels: [__meta_puppetdb_parameter_servername]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9115
Relabel from SD
@roidelapluie
Recap
@roidelapluie
• Simple to deploy
• Behaves as you expect
• Easy to reload
Prometheus is automation Friendly
@roidelapluie
• Easy password/cert rotation
• Service discovery to keep up to date infra
Prometheus is change friendly
@roidelapluie
https://prometheus.io/community
Prometheus is open, join us!
@roidelapluie
Julien Pivotto
@roidelapluie
roidelapluie@inuits.eu
Essensteenweg 31
2930 Brasschaat
Belgium
Contact:
info@inuits.eu
+32-3-8082105

Weitere ähnliche Inhalte

Ähnlich wie Monitoring in a fast-changing world with Prometheus

Ähnlich wie Monitoring in a fast-changing world with Prometheus (20)

Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
 
56k.cloud training
56k.cloud training56k.cloud training
56k.cloud training
 
Scaling Prometheus on Kubernetes with Thanos
Scaling Prometheus on Kubernetes with ThanosScaling Prometheus on Kubernetes with Thanos
Scaling Prometheus on Kubernetes with Thanos
 
WebDev Crash Course
WebDev Crash CourseWebDev Crash Course
WebDev Crash Course
 
Modern Web-site Development Pipeline
Modern Web-site Development PipelineModern Web-site Development Pipeline
Modern Web-site Development Pipeline
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source
 
Node.js
Node.jsNode.js
Node.js
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and Grafana
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaboration
 
Measuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrongMeasuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrong
 
Monitoring your API
Monitoring your APIMonitoring your API
Monitoring your API
 
Adopting OpenTelemetry
Adopting OpenTelemetryAdopting OpenTelemetry
Adopting OpenTelemetry
 
Streams API (Web Engines Hackfest 2015)
Streams API (Web Engines Hackfest 2015)Streams API (Web Engines Hackfest 2015)
Streams API (Web Engines Hackfest 2015)
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Redundant devops
Redundant devopsRedundant devops
Redundant devops
 
ApacheCon Core: Service Discovery in OSGi: Beyond the JVM using Docker and Co...
ApacheCon Core: Service Discovery in OSGi: Beyond the JVM using Docker and Co...ApacheCon Core: Service Discovery in OSGi: Beyond the JVM using Docker and Co...
ApacheCon Core: Service Discovery in OSGi: Beyond the JVM using Docker and Co...
 
Do you know what your Drupal is doing_ Observe it!
Do you know what your Drupal is doing_ Observe it!Do you know what your Drupal is doing_ Observe it!
Do you know what your Drupal is doing_ Observe it!
 
You're monitoring Kubernetes Wrong
You're monitoring Kubernetes WrongYou're monitoring Kubernetes Wrong
You're monitoring Kubernetes Wrong
 

Mehr von Julien Pivotto

Mehr von Julien Pivotto (20)

The O11y Toolkit
The O11y ToolkitThe O11y Toolkit
The O11y Toolkit
 
What's New in Prometheus and Its Ecosystem
What's New in Prometheus and Its EcosystemWhat's New in Prometheus and Its Ecosystem
What's New in Prometheus and Its Ecosystem
 
Prometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is comingPrometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is coming
 
What's new in Prometheus?
What's new in Prometheus?What's new in Prometheus?
What's new in Prometheus?
 
Introduction to Grafana Loki
Introduction to Grafana LokiIntroduction to Grafana Loki
Introduction to Grafana Loki
 
Why you should revisit mgmt
Why you should revisit mgmtWhy you should revisit mgmt
Why you should revisit mgmt
 
Observing the HashiCorp Ecosystem From Prometheus
Observing the HashiCorp Ecosystem From PrometheusObserving the HashiCorp Ecosystem From Prometheus
Observing the HashiCorp Ecosystem From Prometheus
 
5 tips for Prometheus Service Discovery
5 tips for Prometheus Service Discovery5 tips for Prometheus Service Discovery
5 tips for Prometheus Service Discovery
 
Prometheus and TLS - an Introduction
Prometheus and TLS - an IntroductionPrometheus and TLS - an Introduction
Prometheus and TLS - an Introduction
 
Powerful graphs in Grafana
Powerful graphs in GrafanaPowerful graphs in Grafana
Powerful graphs in Grafana
 
YAML Magic
YAML MagicYAML Magic
YAML Magic
 
HAProxy as Egress Controller
HAProxy as Egress ControllerHAProxy as Egress Controller
HAProxy as Egress Controller
 
Improved alerting with Prometheus and Alertmanager
Improved alerting with Prometheus and AlertmanagerImproved alerting with Prometheus and Alertmanager
Improved alerting with Prometheus and Alertmanager
 
SIngle Sign On with Keycloak
SIngle Sign On with KeycloakSIngle Sign On with Keycloak
SIngle Sign On with Keycloak
 
Incident Resolution as Code
Incident Resolution as CodeIncident Resolution as Code
Incident Resolution as Code
 
Monitor your CentOS stack with Prometheus
Monitor your CentOS stack with PrometheusMonitor your CentOS stack with Prometheus
Monitor your CentOS stack with Prometheus
 
Monitor your CentOS stack with Prometheus
Monitor your CentOS stack with PrometheusMonitor your CentOS stack with Prometheus
Monitor your CentOS stack with Prometheus
 
An introduction to Ansible
An introduction to AnsibleAn introduction to Ansible
An introduction to Ansible
 
Jsonnet
JsonnetJsonnet
Jsonnet
 
Cfgmgmt Challenges aren't technical anymore
Cfgmgmt Challenges aren't technical anymoreCfgmgmt Challenges aren't technical anymore
Cfgmgmt Challenges aren't technical anymore
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

Monitoring in a fast-changing world with Prometheus