SlideShare a Scribd company logo
1 of 29
Download to read offline
Monitoring a Kubernetes-backed microservice
architecture with Prometheus
Björn “Beorn” Rabenstein — SoundCloud
Fabian Reinartz — CoreOS
SoundCloud 2012 – from monolith …
… to microservices
Orchestration needed
Run containers in a cluster…
In-house innovation: Bazooka – PaaS, Heroko style.
Problems:
- Only 12-factor apps (stateless etc.).
- Limited resource isolation.
- No sidecars.
- Maturity.
Meanwhile, the open-source world has evolved…
K U B E R N E T E S
Kubernetes
- inspired by Google’s Borg
- not Borg
Today:
Tomorrow:
X
Labels Labels are the
new hierarchies!
name = MyAPI
app = MyAPI
version = 0.4
app = MyAPI
version = 0.5
app = MyAPI
version = 0.4
app = MyDB
cluster = auth
app = MyDB
cluster = misc
select by [ app = MyAPI ]
service
pod
pod
pod
pod
pod
Graphite
Monitoring at SC 2012
Service A
Service B
Service C
Monitoring challenges
‐ A lot of traffic to monitor
- Monitoring traffic should not be proportional to user traffic
‐ Way more targets to monitor
- One host can run many containers
‐ And they constantly change
- Deploys, scaling, rescheduling unhealthy instances …
‐ Need a fleet-wide view.
- What’s my overall 99th percentile latency?
‐ Still need to be able to drill down for troubleshooting
- Which instance/endpoint/version/... causes those errors I’m seeing?
‐ Meaningful alerting
- Symptom-based alerting for pages, cause-based alerting for warnings
- See Rob Ewaschuk’s “My philosophy on alerting" https://goo.gl/2vrpSO
Monitor everything, all levels, with the same system
Level What to monitor (examples) What exposes metrics (example)
Network Routers, switches SNMP exporter
Host (OS, hardware) Hardware failure, provisioning, host resources Node exporter
Container Resource usage, performance characteristics cAdvisor
Application Latency, errors, QPS, internal state Your own code
Orchestration Cluster resources, scheduling Kubernetes components
P R O M E T H E U S
“Obviously the solution to all our problems with everything forever, right?”
Benjamin Staffin, Fitbit Site Operations
Prometheus
- inspired by Google’s Borgmon
- not Borgmon
- initially developed at SoundCloud, open-source from the beginning
- public announcement early 2015
- collects metrics at scale via HTTP (think: yet another client of your microservice)
- thousands of targets, millions of time series, 800k samples/s, no dependencies
- easy to scale
Features – multi-dimensional data model
http_requests_total{instance="web-1", path="/index", status="500", method="GET"}
http_requests_total{instance="web-1", path="/index", status="404", method="POST"}
http_requests_total{instance="web-3", path="/index", status="200", method="GET"}
#metrics x #values(instance) x #values(path) x #values(status) x #values(method)
▶ millions of time series
Labels are the
new hierarchies!
Features – powerful query language
The 3 path-method combinations with the highest number of failing requests?
topk(3, sum by(path, method) (
rate(http_requests_total{status=~"5.."}[5m])
))
The 99th percentile request latency by request path?
histogram_quantile(0.99, sum by(le, path) (
rate(http_requests_duration_seconds_bucket[5m])
))
The questions to ask are often not known beforehand.
Features – powerful query language
topk(3, sum by(path, method) (
rate(http_requests_total{status=~"5.."}[5m])
))
{path="/api/comments", method="POST"} 105.4
{path="/api/user/:id", method="GET"} 34.122
{path="/api/comment/:id/edit", method="POST"} 29.31
Features – easy instrumentation
from prometheus_client import start_http_server, Histogram
# Create a metric to track time spent and requests made.
REQUEST_TIME = Histogram('request_processing_seconds', 'Time spent processing request')
# Decorate function with metric.
@REQUEST_TIME.time()
def process_request(t):
# do work …
return
start_http_server(8000)
Integrations (selection)
K U B E R N E T E S
P R O M E T H E U S
Three questions
How to monitor
services running
on Kubernetes
with Prometheus?
X
?
?
How to monitor
Kubernetes and
containers with
Prometheus?
How to run
Prometheus on
Kubernetes?
Monitoring Services
svc = service-X
namespace = production
version = 22.4
version = 22.5
Monitoring Services via Exporters
svc = mysql
namespace = production
MySQL
exporter
MySQL
Monitoring Kubernetes
namespace = kube-system
apiserver
kubelet
kubelet
etcd
kubelet
kubelet
kubelet
kubelet
Running Prometheus on Kubernetes
- So far: Prometheus ran outside of cluster
- Pod IPs must be routable
- Conventional deployment (Chef, Puppet, …)
- Service discovery needs authentication
- To run Prometheus inside of cluster:
kubectl run --image="quay.io/prometheus/prometheus:0.18.0" prometheus
Monitoring Services
svc = service-X
namespace = production
version = 22.4
version = 22.5
app = prometheus
namespace = kube-system
apiserver
kubelet
kubelet
etcd
kubelet
kubelet
kubelet
kubelet
app = prometheus
namespace = infra
Monitoring Kubernetes
What about storage?
A) None
B) Network/Cloud volumes
C) Host volumes
DEMO
CC BY 2.0
Author: carstingaxion / Carsten Bach
The end

More Related Content

What's hot

promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...Tokuhiro Matsuno
 
Breaking Prometheus (Promcon Berlin '16)
Breaking Prometheus (Promcon Berlin '16)Breaking Prometheus (Promcon Berlin '16)
Breaking Prometheus (Promcon Berlin '16)Matthew Campbell
 
Service Discovery in Prometheus
Service Discovery in PrometheusService Discovery in Prometheus
Service Discovery in PrometheusOliver Moser
 
Monitoring infrastructure with prometheus
Monitoring infrastructure with prometheusMonitoring infrastructure with prometheus
Monitoring infrastructure with prometheusShahnawaz Saifi
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusGrafana Labs
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Brian Brazil
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaSyah Dwi Prihatmoko
 
Using Grails to power your electric car
Using Grails to power your electric carUsing Grails to power your electric car
Using Grails to power your electric carMarco Pas
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with PrometheusShiao-An Yuan
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Grafana Labs
 
Server monitoring using grafana and prometheus
Server monitoring using grafana and prometheusServer monitoring using grafana and prometheus
Server monitoring using grafana and prometheusCeline George
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
 
The history of Prometheus at SoundCloud
The history of Prometheus at SoundCloudThe history of Prometheus at SoundCloud
The history of Prometheus at SoundCloudTobias Schmidt
 
Kafka monitoring and metrics
Kafka monitoring and metricsKafka monitoring and metrics
Kafka monitoring and metricsTouraj Ebrahimi
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaSridhar Kumar N
 
The RED Method: How to monitoring your microservices.
The RED Method: How to monitoring your microservices.The RED Method: How to monitoring your microservices.
The RED Method: How to monitoring your microservices.Grafana Labs
 
What is your application doing right now? An introduction to Prometheus
What is your application doing right now? An introduction to PrometheusWhat is your application doing right now? An introduction to Prometheus
What is your application doing right now? An introduction to PrometheusMatthias Grüter
 
20171027 モニタリング勉強会
20171027 モニタリング勉強会20171027 モニタリング勉強会
20171027 モニタリング勉強会Paul Traylor
 

What's hot (20)

promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
 
Breaking Prometheus (Promcon Berlin '16)
Breaking Prometheus (Promcon Berlin '16)Breaking Prometheus (Promcon Berlin '16)
Breaking Prometheus (Promcon Berlin '16)
 
Service Discovery in Prometheus
Service Discovery in PrometheusService Discovery in Prometheus
Service Discovery in Prometheus
 
Monitoring infrastructure with prometheus
Monitoring infrastructure with prometheusMonitoring infrastructure with prometheus
Monitoring infrastructure with prometheus
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and Grafana
 
Using Grails to power your electric car
Using Grails to power your electric carUsing Grails to power your electric car
Using Grails to power your electric car
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018
 
Server monitoring using grafana and prometheus
Server monitoring using grafana and prometheusServer monitoring using grafana and prometheus
Server monitoring using grafana and prometheus
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
The history of Prometheus at SoundCloud
The history of Prometheus at SoundCloudThe history of Prometheus at SoundCloud
The history of Prometheus at SoundCloud
 
Kafka monitoring and metrics
Kafka monitoring and metricsKafka monitoring and metrics
Kafka monitoring and metrics
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
 
The RED Method: How to monitoring your microservices.
The RED Method: How to monitoring your microservices.The RED Method: How to monitoring your microservices.
The RED Method: How to monitoring your microservices.
 
Prometheus with Grafana - AddWeb Solution
Prometheus with Grafana - AddWeb SolutionPrometheus with Grafana - AddWeb Solution
Prometheus with Grafana - AddWeb Solution
 
Cloud Monitoring tool Grafana
Cloud Monitoring  tool Grafana Cloud Monitoring  tool Grafana
Cloud Monitoring tool Grafana
 
What is your application doing right now? An introduction to Prometheus
What is your application doing right now? An introduction to PrometheusWhat is your application doing right now? An introduction to Prometheus
What is your application doing right now? An introduction to Prometheus
 
20171027 モニタリング勉強会
20171027 モニタリング勉強会20171027 モニタリング勉強会
20171027 モニタリング勉強会
 

Similar to Monitoring a Kubernetes-backed microservice architecture with Prometheus

Google app-engine-cloudcamplagos2011
Google app-engine-cloudcamplagos2011Google app-engine-cloudcamplagos2011
Google app-engine-cloudcamplagos2011Opevel
 
MuleSoft Meetup Roma - Processi di Automazione su CloudHub
MuleSoft Meetup Roma - Processi di Automazione su CloudHubMuleSoft Meetup Roma - Processi di Automazione su CloudHub
MuleSoft Meetup Roma - Processi di Automazione su CloudHubAlfonso Martino
 
Deep Dive into SpaceONE
Deep Dive into SpaceONEDeep Dive into SpaceONE
Deep Dive into SpaceONEChoonho Son
 
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016Aad Versteden
 
Webinar september 2013
Webinar september 2013Webinar september 2013
Webinar september 2013Marc Gille
 
StrongLoop Overview
StrongLoop OverviewStrongLoop Overview
StrongLoop OverviewShubhra Kar
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017Patrick Chanezon
 
Plone FSR
Plone FSRPlone FSR
Plone FSRfulv
 
In cluster open source testing framework - Microservices Meetup
In cluster open source testing framework - Microservices MeetupIn cluster open source testing framework - Microservices Meetup
In cluster open source testing framework - Microservices MeetupNeil Gehani
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleJim Dowling
 
IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...
IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...
IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...OpenWhisk
 
Semantic technologies in practice - KULeuven 2016
Semantic technologies in practice - KULeuven 2016Semantic technologies in practice - KULeuven 2016
Semantic technologies in practice - KULeuven 2016Aad Versteden
 
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic SystemTimely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic SystemAccumulo Summit
 
Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09Chris Bunch
 
Social Connections 13 - Troubleshooting Connections Pink
Social Connections 13 - Troubleshooting Connections PinkSocial Connections 13 - Troubleshooting Connections Pink
Social Connections 13 - Troubleshooting Connections PinkNico Meisenzahl
 
Apache OpenWhisk Serverless Computing
Apache OpenWhisk Serverless ComputingApache OpenWhisk Serverless Computing
Apache OpenWhisk Serverless ComputingUpkar Lidder
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and FutureKeiichiro Ono
 

Similar to Monitoring a Kubernetes-backed microservice architecture with Prometheus (20)

Google app-engine-cloudcamplagos2011
Google app-engine-cloudcamplagos2011Google app-engine-cloudcamplagos2011
Google app-engine-cloudcamplagos2011
 
Google App Engine
Google App EngineGoogle App Engine
Google App Engine
 
MuleSoft Meetup Roma - Processi di Automazione su CloudHub
MuleSoft Meetup Roma - Processi di Automazione su CloudHubMuleSoft Meetup Roma - Processi di Automazione su CloudHub
MuleSoft Meetup Roma - Processi di Automazione su CloudHub
 
Deep Dive into SpaceONE
Deep Dive into SpaceONEDeep Dive into SpaceONE
Deep Dive into SpaceONE
 
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
 
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016
 
Webinar september 2013
Webinar september 2013Webinar september 2013
Webinar september 2013
 
StrongLoop Overview
StrongLoop OverviewStrongLoop Overview
StrongLoop Overview
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017
 
Plone FSR
Plone FSRPlone FSR
Plone FSR
 
In cluster open source testing framework - Microservices Meetup
In cluster open source testing framework - Microservices MeetupIn cluster open source testing framework - Microservices Meetup
In cluster open source testing framework - Microservices Meetup
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
 
IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...
IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...
IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...
 
Semantic technologies in practice - KULeuven 2016
Semantic technologies in practice - KULeuven 2016Semantic technologies in practice - KULeuven 2016
Semantic technologies in practice - KULeuven 2016
 
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic SystemTimely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
 
Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09
 
Social Connections 13 - Troubleshooting Connections Pink
Social Connections 13 - Troubleshooting Connections PinkSocial Connections 13 - Troubleshooting Connections Pink
Social Connections 13 - Troubleshooting Connections Pink
 
Apache OpenWhisk Serverless Computing
Apache OpenWhisk Serverless ComputingApache OpenWhisk Serverless Computing
Apache OpenWhisk Serverless Computing
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and Future
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 

Recently uploaded

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

Monitoring a Kubernetes-backed microservice architecture with Prometheus

  • 1. Monitoring a Kubernetes-backed microservice architecture with Prometheus Björn “Beorn” Rabenstein — SoundCloud Fabian Reinartz — CoreOS
  • 2. SoundCloud 2012 – from monolith …
  • 4. Orchestration needed Run containers in a cluster… In-house innovation: Bazooka – PaaS, Heroko style. Problems: - Only 12-factor apps (stateless etc.). - Limited resource isolation. - No sidecars. - Maturity. Meanwhile, the open-source world has evolved…
  • 5. K U B E R N E T E S
  • 6. Kubernetes - inspired by Google’s Borg - not Borg Today: Tomorrow: X
  • 7. Labels Labels are the new hierarchies! name = MyAPI app = MyAPI version = 0.4 app = MyAPI version = 0.5 app = MyAPI version = 0.4 app = MyDB cluster = auth app = MyDB cluster = misc select by [ app = MyAPI ] service pod pod pod pod pod
  • 8. Graphite Monitoring at SC 2012 Service A Service B Service C
  • 9. Monitoring challenges ‐ A lot of traffic to monitor - Monitoring traffic should not be proportional to user traffic ‐ Way more targets to monitor - One host can run many containers ‐ And they constantly change - Deploys, scaling, rescheduling unhealthy instances … ‐ Need a fleet-wide view. - What’s my overall 99th percentile latency? ‐ Still need to be able to drill down for troubleshooting - Which instance/endpoint/version/... causes those errors I’m seeing? ‐ Meaningful alerting - Symptom-based alerting for pages, cause-based alerting for warnings - See Rob Ewaschuk’s “My philosophy on alerting" https://goo.gl/2vrpSO
  • 10. Monitor everything, all levels, with the same system Level What to monitor (examples) What exposes metrics (example) Network Routers, switches SNMP exporter Host (OS, hardware) Hardware failure, provisioning, host resources Node exporter Container Resource usage, performance characteristics cAdvisor Application Latency, errors, QPS, internal state Your own code Orchestration Cluster resources, scheduling Kubernetes components
  • 11. P R O M E T H E U S “Obviously the solution to all our problems with everything forever, right?” Benjamin Staffin, Fitbit Site Operations
  • 12. Prometheus - inspired by Google’s Borgmon - not Borgmon - initially developed at SoundCloud, open-source from the beginning - public announcement early 2015 - collects metrics at scale via HTTP (think: yet another client of your microservice) - thousands of targets, millions of time series, 800k samples/s, no dependencies - easy to scale
  • 13. Features – multi-dimensional data model http_requests_total{instance="web-1", path="/index", status="500", method="GET"} http_requests_total{instance="web-1", path="/index", status="404", method="POST"} http_requests_total{instance="web-3", path="/index", status="200", method="GET"} #metrics x #values(instance) x #values(path) x #values(status) x #values(method) ▶ millions of time series Labels are the new hierarchies!
  • 14. Features – powerful query language The 3 path-method combinations with the highest number of failing requests? topk(3, sum by(path, method) ( rate(http_requests_total{status=~"5.."}[5m]) )) The 99th percentile request latency by request path? histogram_quantile(0.99, sum by(le, path) ( rate(http_requests_duration_seconds_bucket[5m]) )) The questions to ask are often not known beforehand.
  • 15. Features – powerful query language topk(3, sum by(path, method) ( rate(http_requests_total{status=~"5.."}[5m]) )) {path="/api/comments", method="POST"} 105.4 {path="/api/user/:id", method="GET"} 34.122 {path="/api/comment/:id/edit", method="POST"} 29.31
  • 16. Features – easy instrumentation from prometheus_client import start_http_server, Histogram # Create a metric to track time spent and requests made. REQUEST_TIME = Histogram('request_processing_seconds', 'Time spent processing request') # Decorate function with metric. @REQUEST_TIME.time() def process_request(t): # do work … return start_http_server(8000)
  • 18. K U B E R N E T E S P R O M E T H E U S
  • 19.
  • 20. Three questions How to monitor services running on Kubernetes with Prometheus? X ? ? How to monitor Kubernetes and containers with Prometheus? How to run Prometheus on Kubernetes?
  • 21. Monitoring Services svc = service-X namespace = production version = 22.4 version = 22.5
  • 22. Monitoring Services via Exporters svc = mysql namespace = production MySQL exporter MySQL
  • 23. Monitoring Kubernetes namespace = kube-system apiserver kubelet kubelet etcd kubelet kubelet kubelet kubelet
  • 24. Running Prometheus on Kubernetes - So far: Prometheus ran outside of cluster - Pod IPs must be routable - Conventional deployment (Chef, Puppet, …) - Service discovery needs authentication - To run Prometheus inside of cluster: kubectl run --image="quay.io/prometheus/prometheus:0.18.0" prometheus
  • 25. Monitoring Services svc = service-X namespace = production version = 22.4 version = 22.5 app = prometheus
  • 27. What about storage? A) None B) Network/Cloud volumes C) Host volumes
  • 28. DEMO CC BY 2.0 Author: carstingaxion / Carsten Bach