SlideShare ist ein Scribd-Unternehmen logo
Automating the Configuration of
Monitoring on Large Infrastructures
How monitoring of dynamic infrastructures at scale can be made easier with
Uyuni, Prometheus and Grafana
João Cavalheiro, Engineering Manager – jcavalheiro@suse.com
Johannes Renner, Engineering Manager – jrenner@suse.com
Managing IT Infrastructures is hard
● In most companies, the IT landscape is diverse and complex
● ...And nearly impossible to manage beyond a certain scale without
automation
● Modern application stacks are multi-modal: VMs and containers
spread across private and public clouds
● Different operating systems have different requirements
● Many companies require reporting and compliance
● Security is a concern
2
Enter Uyuni
Uyuni is an open-source solution for managing Linux infrastructure
● Can save you time and headaches when you have to manage and
update tens, hundreds or even thousands of machines
● Mass-deploy patches and packages based on software channels
● Consistent and repeatable provisioning and configuration of bare
metal, VMs and containers
● Automates configuration of monitoring with Prometheus and
Grafana
3
Origins: Spacewalk
● Open-source systems management solution
● Upstream for Red Hat Satellite 5, around since 2008
● Supported managing of Fedora, CentOS and Debian
● Adopted by SUSE as upstream for SUSE Manager
● Satellite 6 was built on different technologies:
∙ Spacewalk entered maintenance mode
∙ Only bugfixes, no plans for the future
∙ Many patches pending to implement modernizations!
4
Uyuni
/uju:ˈni/
“Salar de Uyuni” is the world's largest salt flat*
Image: https://www.flickr.com/photos/madeleine_h/9468953452/
Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0)
* https://en.wikipedia.org/wiki/Salar_de_Uyuni
What is Salt?
● Open-source software for remote task execution and (descriptive)
configuration management
● Works on almost any platform - only Python is needed
● Typically requires an agent (minion) that connects to a master
● ZeroMQ used as default transport
● Event-driven architecture supporting automation
● Scalable, extensible and customizable
6
Salt Concepts
7
Uyuni: An Opinionated Fork of Spacewalk
● New backend based on Salt
● Modernized codebase (React.js, Python 3, JDK11)
● Content lifecycle management
● Container image building and Kubernetes integration
● Improved virtualization management
● Monitoring automation based on Prometheus & Grafana
8
Monitoring 101
9
Getting started with metrics
Main data source for alerting and visualization:
● Starting point for troubleshooting
∙ "Something looks wrong on this dashboard"
∙ Used as Service Level Indicators
● How available are we to the outside world?
∙ What are our customers experiencing?
Good metrics help to eliminate hypothesis before you investigate them.
10
About Prometheus
● Originally built at SoundCloud
● Has its own time-series database
● Data collection via pull model over HTTP
● Targets are set via static configuration or service discovery
● Metrics have a name, a set of labels, a timestamp and a value
11
Exposing Metrics
● Each application/system we want to monitor must expose metrics
● Instrumentation vs. exporters
When the metrics endpoint is embedded in an existing application it is
referred to as instrumentation.
● Extensive list of Prometheus exporters
∙ https://prometheus.io/docs/instrumenting/exporters/
∙ Node exporter is one of the most widely used
● Easy to build your own exporters
∙ You can monitor almost anything
12
Querying Metrics
● Prometheus has its own query language - PromQL
∙ PromQL is a functional expression language
∙ Allows to easily filter multidimensional time-series
● Example: HTTP internal server errors per second.. an hour ago
∙ rate(api_http_requests_total{status=500}[5m] offset 1h)
● Regex matching
∙ up{instance=~"web-server-.*"} == 0
● Used in all interactions with Prometheus (visualization, alerts)
13
Alerts
● Prometheus has its own alerting system – Alertmanager
∙ Takes care of deduplication, grouping, and routing
● Alerting rules are written in PromQL
● Supports HA setups
● Integration with email, PagerDuty and OpsGenie
● HTTP API and CLI tool: amtool
∙ Can be “plugged” into your existing scripts
14
Grafana
● Used to query and visualize metrics
● Works with Prometheus, but not only
∙ Grafana supports multiple backends
∙ It is possible to combine data from different sources in the same
dashboard
● Fully customizable
∙ Each panel has a wide variety of styling and formatting options
∙ Supports templates
∙ Collection of add-ons and pre-built dashboards
15
How to Get Started?
● Which components do I need to install?
● How to configure Prometheus and Grafana?
● How to configure my systems to expose their metrics?
● How do I get started with building dashboards?
16
Monitoring at Scale
Common data centers go beyond thousands of machines
● Different system types (physical, VMs, containers)
● Different operating systems
● A lot of different metrics from different sources
● What can be automated?
It’s not practical to manually maintain configuration files for all this
diversity!
17
Putting the Pieces Together
18
Uyuni Meets Monitoring
Automate Prometheus Monitoring with Uyuni
19
Uyuni Meets Monitoring
Single Pane of Glass for Monitoring Configuration
● Provisioning and configuration of Prometheus and Grafana
● Pre-built Grafana dashboards
● Enable exporters on managed clients using Salt Formulas
● Group systems to create common configurations
● Prometheus service discovery
● Reproducible setups
20
Live Demo
21
Coming next
● Support for Prometheus federations
● Improve the existing automation (e.g. more exporters), including:
● cadvisor for Docker containers
● libvirt exporter for KVM hypervisors
● kubernetes
● blackbox exporter
● Alerting templates
● Authentication and TLS encryption
● Automated firewall configuration
22
Questions?
23
https://www.uyuni-project.org/
github.com/uyuni-project
@UyuniProject
uyuni-announce+subscribe@opensuse.org
#uyuni @ irc.freenode.org
Thank you!

Weitere ähnliche Inhalte

Ähnlich wie OSMC 2019 | Automating the conficuration of Monitoring on Large Infrastructures by João Cavalheiro

Ähnlich wie OSMC 2019 | Automating the conficuration of Monitoring on Large Infrastructures by João Cavalheiro (20)

Prometheus
PrometheusPrometheus
Prometheus
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source
 
Uyuni, the movie
Uyuni, the movieUyuni, the movie
Uyuni, the movie
 
MuleSoft Surat Virtual Meetup#16 - Anypoint Deployment Option, API and Operat...
MuleSoft Surat Virtual Meetup#16 - Anypoint Deployment Option, API and Operat...MuleSoft Surat Virtual Meetup#16 - Anypoint Deployment Option, API and Operat...
MuleSoft Surat Virtual Meetup#16 - Anypoint Deployment Option, API and Operat...
 
Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Prometheus for Monitoring Metrics (Percona Live Europe 2017)Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Prometheus for Monitoring Metrics (Percona Live Europe 2017)
 
System monitoring
System monitoringSystem monitoring
System monitoring
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and Grafana
 
Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
 
Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)
 
Prometheus and Grafana
Prometheus and GrafanaPrometheus and Grafana
Prometheus and Grafana
 
PCF2.2 update mkim_201807
PCF2.2 update mkim_201807PCF2.2 update mkim_201807
PCF2.2 update mkim_201807
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus Overview
 
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014
 
DevOps Spain 2019. Beatriz Martínez-IBM
DevOps Spain 2019. Beatriz Martínez-IBMDevOps Spain 2019. Beatriz Martínez-IBM
DevOps Spain 2019. Beatriz Martínez-IBM
 
Kick starting Network Automation
Kick starting Network AutomationKick starting Network Automation
Kick starting Network Automation
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
Controlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWSControlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWS
 
Mulesoft Meetup Milano #9 - Batch Processing and CI/CD
Mulesoft Meetup Milano #9 - Batch Processing and CI/CDMulesoft Meetup Milano #9 - Batch Processing and CI/CD
Mulesoft Meetup Milano #9 - Batch Processing and CI/CD
 

Kürzlich hochgeladen

Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
mbmh111980
 

Kürzlich hochgeladen (20)

Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdfMicrosoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with StrimziStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024
 
CompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfCompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdf
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 

OSMC 2019 | Automating the conficuration of Monitoring on Large Infrastructures by João Cavalheiro

  • 1. Automating the Configuration of Monitoring on Large Infrastructures How monitoring of dynamic infrastructures at scale can be made easier with Uyuni, Prometheus and Grafana João Cavalheiro, Engineering Manager – jcavalheiro@suse.com Johannes Renner, Engineering Manager – jrenner@suse.com
  • 2. Managing IT Infrastructures is hard ● In most companies, the IT landscape is diverse and complex ● ...And nearly impossible to manage beyond a certain scale without automation ● Modern application stacks are multi-modal: VMs and containers spread across private and public clouds ● Different operating systems have different requirements ● Many companies require reporting and compliance ● Security is a concern 2
  • 3. Enter Uyuni Uyuni is an open-source solution for managing Linux infrastructure ● Can save you time and headaches when you have to manage and update tens, hundreds or even thousands of machines ● Mass-deploy patches and packages based on software channels ● Consistent and repeatable provisioning and configuration of bare metal, VMs and containers ● Automates configuration of monitoring with Prometheus and Grafana 3
  • 4. Origins: Spacewalk ● Open-source systems management solution ● Upstream for Red Hat Satellite 5, around since 2008 ● Supported managing of Fedora, CentOS and Debian ● Adopted by SUSE as upstream for SUSE Manager ● Satellite 6 was built on different technologies: ∙ Spacewalk entered maintenance mode ∙ Only bugfixes, no plans for the future ∙ Many patches pending to implement modernizations! 4
  • 5. Uyuni /uju:ˈni/ “Salar de Uyuni” is the world's largest salt flat* Image: https://www.flickr.com/photos/madeleine_h/9468953452/ Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0) * https://en.wikipedia.org/wiki/Salar_de_Uyuni
  • 6. What is Salt? ● Open-source software for remote task execution and (descriptive) configuration management ● Works on almost any platform - only Python is needed ● Typically requires an agent (minion) that connects to a master ● ZeroMQ used as default transport ● Event-driven architecture supporting automation ● Scalable, extensible and customizable 6
  • 8. Uyuni: An Opinionated Fork of Spacewalk ● New backend based on Salt ● Modernized codebase (React.js, Python 3, JDK11) ● Content lifecycle management ● Container image building and Kubernetes integration ● Improved virtualization management ● Monitoring automation based on Prometheus & Grafana 8
  • 10. Getting started with metrics Main data source for alerting and visualization: ● Starting point for troubleshooting ∙ "Something looks wrong on this dashboard" ∙ Used as Service Level Indicators ● How available are we to the outside world? ∙ What are our customers experiencing? Good metrics help to eliminate hypothesis before you investigate them. 10
  • 11. About Prometheus ● Originally built at SoundCloud ● Has its own time-series database ● Data collection via pull model over HTTP ● Targets are set via static configuration or service discovery ● Metrics have a name, a set of labels, a timestamp and a value 11
  • 12. Exposing Metrics ● Each application/system we want to monitor must expose metrics ● Instrumentation vs. exporters When the metrics endpoint is embedded in an existing application it is referred to as instrumentation. ● Extensive list of Prometheus exporters ∙ https://prometheus.io/docs/instrumenting/exporters/ ∙ Node exporter is one of the most widely used ● Easy to build your own exporters ∙ You can monitor almost anything 12
  • 13. Querying Metrics ● Prometheus has its own query language - PromQL ∙ PromQL is a functional expression language ∙ Allows to easily filter multidimensional time-series ● Example: HTTP internal server errors per second.. an hour ago ∙ rate(api_http_requests_total{status=500}[5m] offset 1h) ● Regex matching ∙ up{instance=~"web-server-.*"} == 0 ● Used in all interactions with Prometheus (visualization, alerts) 13
  • 14. Alerts ● Prometheus has its own alerting system – Alertmanager ∙ Takes care of deduplication, grouping, and routing ● Alerting rules are written in PromQL ● Supports HA setups ● Integration with email, PagerDuty and OpsGenie ● HTTP API and CLI tool: amtool ∙ Can be “plugged” into your existing scripts 14
  • 15. Grafana ● Used to query and visualize metrics ● Works with Prometheus, but not only ∙ Grafana supports multiple backends ∙ It is possible to combine data from different sources in the same dashboard ● Fully customizable ∙ Each panel has a wide variety of styling and formatting options ∙ Supports templates ∙ Collection of add-ons and pre-built dashboards 15
  • 16. How to Get Started? ● Which components do I need to install? ● How to configure Prometheus and Grafana? ● How to configure my systems to expose their metrics? ● How do I get started with building dashboards? 16
  • 17. Monitoring at Scale Common data centers go beyond thousands of machines ● Different system types (physical, VMs, containers) ● Different operating systems ● A lot of different metrics from different sources ● What can be automated? It’s not practical to manually maintain configuration files for all this diversity! 17
  • 18. Putting the Pieces Together 18
  • 19. Uyuni Meets Monitoring Automate Prometheus Monitoring with Uyuni 19
  • 20. Uyuni Meets Monitoring Single Pane of Glass for Monitoring Configuration ● Provisioning and configuration of Prometheus and Grafana ● Pre-built Grafana dashboards ● Enable exporters on managed clients using Salt Formulas ● Group systems to create common configurations ● Prometheus service discovery ● Reproducible setups 20
  • 22. Coming next ● Support for Prometheus federations ● Improve the existing automation (e.g. more exporters), including: ● cadvisor for Docker containers ● libvirt exporter for KVM hypervisors ● kubernetes ● blackbox exporter ● Alerting templates ● Authentication and TLS encryption ● Automated firewall configuration 22