IcingaCamp Stockholm - How to make your monitoring shut up

•

0 gefällt mir•3,048 views

Icinga

Presentation given at Icinga Camp Stockholm in October 2016

Technologie

How to make your monitoring
shut up
Observations on what to monitor and when to alert

Hello!
@moegyver@EttLejon
(tho I hardly use twitter…)
(easier to find me as lejonet
@Freenode/OFTC IRC)

What is your name?
Ops/DevOps
What is your quest?
A healthy and functioning server
environment
What is… your favourite time to
be woken up by an alert?
Erh.. Never.. Erh office hou
aaaaaaaaaah!

Example Checklist for monitoring vs alerting
1. What am I actually monitoring, service or details of
part of a service (hardware, connectivity, etc)?
If the answer isn’t a service, then it might not be fit
for alerting
2. Will I get fired for not responding directly to this?
(aka is it business critical)
If the answer to that is yes, maybe its fit for
alerting
3. Is what I want to check possibly covered by
another alert or monitoring check? E.g.
connectivity is also answered by a check for if / in
the webroot can be requested, in combination with
answering if the webserver is properly configured

And for the love of all that is compiling, keep all the monitoring and alerting in _ONE_
system!
(if you need/want to use more than one, atleast aggregate the end result to _ONE_
system)

Arguing about how to monitor/alert or with
what takes away from the fact that you’re
not that interested in that the monitoring is
done but what information it gives you.
AKA
Even someone pressing F5 in their browser
to verify that the site is up is more
informative than: Server has 1337 MB RAM
free, / is 42% free and there is 1
(management ;) ) zombie process

TMI!
DISK CRITICAL - free space: / 7002 MB (9% inode 60%):
/data 16273093 MB (26% inode 99%)

???
DISK CRITICAL - free space: / down to 9%

✓
DISK CRITICAL - root will be out of space in 2 h

License:CC BY-ND 2.0, Wisconsin Department of Natural Resources

License: CC-SA, Marco Mayer
2 years and counting...

Empfohlen

Icinga Camp Belgrade - ITAF Monitoring best practices & demoIcinga

STIG Compliance and Remediation with AnsibleAnsible

Monitoring the #DevOps wayTheo Schlossnagle

Ansible for EnterpriseAnsible

PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...Puppet

Security as Code: DOES15Ed Bellis

OSMC 2017 | Current State of Icinga by Erk BerndNETWAYS

The Open-Source Monitoring LandscapeMike Merideth

Empfohlen

Icinga Camp Belgrade - ITAF Monitoring best practices & demoIcinga

STIG Compliance and Remediation with AnsibleAnsible

Monitoring the #DevOps wayTheo Schlossnagle

Ansible for EnterpriseAnsible

PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...Puppet

Security as Code: DOES15Ed Bellis

OSMC 2017 | Current State of Icinga by Erk BerndNETWAYS

The Open-Source Monitoring LandscapeMike Merideth

AnsibleFest 2019 - Greenfielding Network and Systems Automation in a Large an...Logan Best

Webhooks with Azure Functions - Live 360 ConferenceSparkPost

OSMC 2015: Monitoring at Spotify-When things go ping in the night by Martin ParmNETWAYS

Introduction to Puppet EnterprisePuppet

Enforce compliance policy with model-driven automationPuppet

Introduction to Puppet EnterprisePuppet

OWASP DefectDojo - Open Source Security SanityMatt Tesauro

What's New in Puppet Enterprise 2016.5Puppet

Top 8 mistakes developer teams make in their first serverless projectPaul Swail

DOES SFO 2016 - Scott Willson - Top 10 Ways to Fail at DevOpsGene Kim

LogmaticPresentationlogmatic.io

Intro to Puppet Enterprise 06.28.2017Puppet

Intro to DefectDojo at OWASP SwitzerlandMatt Tesauro

Puppet Camp East, A New Cloud Operating Model, Ranjit Viswakumar, HashicorpPuppet

Icinga Camp Bangalore - Icinga integrationsIcinga

Are Your Microservices Naked and Afraid? VMware Tanzu

Intro to Puppet Enterprise for a Windows Environment - 08.23Puppet

ArcherySec 2.0 @ BlackHat Arsenal Europe 2020Anand Tiwari

Archery - BlackHat Asia 2018 Anand Tiwari

Introduction to Puppet Enterprise 2016.5Puppet

IcingaCamp Stockholm - Icinga Web2Icinga

Icinga Camp Amsterdam - Infrastructure as CodeIcinga

Weitere ähnliche Inhalte

Was ist angesagt?

AnsibleFest 2019 - Greenfielding Network and Systems Automation in a Large an...Logan Best

Webhooks with Azure Functions - Live 360 ConferenceSparkPost

OSMC 2015: Monitoring at Spotify-When things go ping in the night by Martin ParmNETWAYS

Introduction to Puppet EnterprisePuppet

Enforce compliance policy with model-driven automationPuppet

Introduction to Puppet EnterprisePuppet

OWASP DefectDojo - Open Source Security SanityMatt Tesauro

What's New in Puppet Enterprise 2016.5Puppet

Top 8 mistakes developer teams make in their first serverless projectPaul Swail

DOES SFO 2016 - Scott Willson - Top 10 Ways to Fail at DevOpsGene Kim

LogmaticPresentationlogmatic.io

Intro to Puppet Enterprise 06.28.2017Puppet

Intro to DefectDojo at OWASP SwitzerlandMatt Tesauro

Puppet Camp East, A New Cloud Operating Model, Ranjit Viswakumar, HashicorpPuppet

Icinga Camp Bangalore - Icinga integrationsIcinga

Are Your Microservices Naked and Afraid? VMware Tanzu

Intro to Puppet Enterprise for a Windows Environment - 08.23Puppet

ArcherySec 2.0 @ BlackHat Arsenal Europe 2020Anand Tiwari

Archery - BlackHat Asia 2018 Anand Tiwari

Introduction to Puppet Enterprise 2016.5Puppet

Was ist angesagt? (20)

AnsibleFest 2019 - Greenfielding Network and Systems Automation in a Large an...

Webhooks with Azure Functions - Live 360 Conference

OSMC 2015: Monitoring at Spotify-When things go ping in the night by Martin Parm

Introduction to Puppet Enterprise

Enforce compliance policy with model-driven automation

Introduction to Puppet Enterprise

OWASP DefectDojo - Open Source Security Sanity

What's New in Puppet Enterprise 2016.5

Top 8 mistakes developer teams make in their first serverless project

DOES SFO 2016 - Scott Willson - Top 10 Ways to Fail at DevOps

LogmaticPresentation

Intro to Puppet Enterprise 06.28.2017

Intro to DefectDojo at OWASP Switzerland

Puppet Camp East, A New Cloud Operating Model, Ranjit Viswakumar, Hashicorp

Icinga Camp Bangalore - Icinga integrations

Are Your Microservices Naked and Afraid?

Intro to Puppet Enterprise for a Windows Environment - 08.23

ArcherySec 2.0 @ BlackHat Arsenal Europe 2020

Archery - BlackHat Asia 2018

Introduction to Puppet Enterprise 2016.5

Andere mochten auch

IcingaCamp Stockholm - Icinga Web2Icinga

Icinga Camp Amsterdam - Infrastructure as CodeIcinga

Icinga Camp Belgrade - ITAF IntroductionIcinga

Icinga Camp Amsterdam - Monitoring – When to startIcinga

Icinga 2 - Apify them all at Icinga Camp Amsterdam 2016Icinga

Icinga Camp Amsterdam - Icinga DirectorIcinga

Icinga Camp Amsterdam - Icinga2 and AnsibleIcinga

IcingaCamp Stockholm - OpeningIcinga

MoniTutorIcinga

Icinga Camp Amsterdam - Icinga2 and PuppetIcinga

Icinga Camp Belgrade - State of IcingaIcinga

Icinga Camp Belgrade - Icinga Web 2Icinga

Presentation about Icinga at Kiratech DevOps Day in VeronaIcinga

Icinga Camp San Diego: Apify them allIcinga

IcingaCamp Stockholm - NSClient++Icinga

Icinga Camp Amsterdam - Icinga, Graphite, GrafanaIcinga

Icinga Camp Amsterdam - How to monitor WindowsIcinga

Icinga2 - Apify them allIcinga

Saluki - do it like a userIcinga

Icinga Camp Amsterdam - Introduction into Icinga Web 2Icinga

Andere mochten auch (20)

IcingaCamp Stockholm - Icinga Web2

Icinga Camp Amsterdam - Infrastructure as Code

Icinga Camp Belgrade - ITAF Introduction

Icinga Camp Amsterdam - Monitoring – When to start

Icinga 2 - Apify them all at Icinga Camp Amsterdam 2016

Icinga Camp Amsterdam - Icinga Director

Icinga Camp Amsterdam - Icinga2 and Ansible

IcingaCamp Stockholm - Opening

MoniTutor

Icinga Camp Amsterdam - Icinga2 and Puppet

Icinga Camp Belgrade - State of Icinga

Icinga Camp Belgrade - Icinga Web 2

Presentation about Icinga at Kiratech DevOps Day in Verona

Icinga Camp San Diego: Apify them all

IcingaCamp Stockholm - NSClient++

Icinga Camp Amsterdam - Icinga, Graphite, Grafana

Icinga Camp Amsterdam - How to monitor Windows

Icinga2 - Apify them all

Saluki - do it like a user

Icinga Camp Amsterdam - Introduction into Icinga Web 2

Ähnlich wie IcingaCamp Stockholm - How to make your monitoring shut up

Automatic Assessment of Failure Recovery in Erlang ApplicationsJan Henry Nystrom

An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil

The limits of unit testing by Craig StuntzQA or the Highway

The Limits of Unit Testing by Craig StuntzQA or the Highway

Prometheus (Prometheus London, 2016)Brian Brazil

P1,P2,P3,M1,M2, D1Philip Martin

Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...Brian Brazil

Mere Paas Teensy Hai (Nikhil Mittal)ClubHack

Angus Fletcher - Error Handling in Concurrent SystemsMaritime DevCon

Put Some SRE in Your Shipped SoftwareTheo Schlossnagle

Metric Abuse: Frequently Misused Metrics in OracleSteve Karam

Prometheus - Open Source Forum JapanBrian Brazil

Reliability Patterns for Distributed ApplicationsAndrew Hamilton

Step by Step on How to Setup DarkCometPich Pra Tna

Ultimate Guide to Setup DarkComet with NoIPPich Pra Tna

1. In OS, multiple jobs can run in parallel and finish faster than i.pdfgulshan16175gs

Watching Somebody Else's Computer: Cloud Native ObservabilityRonald McCollam

SELJE - VFP and IT Security.pdfEric Selje

How to Use FTP Filescrysatal16

Start Up Austin 2017: Manual vs Automation - When to Start Automating your Pr...Amazon Web Services

Ähnlich wie IcingaCamp Stockholm - How to make your monitoring shut up (20)

Automatic Assessment of Failure Recovery in Erlang Applications

An Introduction to Prometheus (GrafanaCon 2016)

The limits of unit testing by Craig Stuntz

The Limits of Unit Testing by Craig Stuntz

Prometheus (Prometheus London, 2016)

P1,P2,P3,M1,M2, D1

Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...

Mere Paas Teensy Hai (Nikhil Mittal)

Angus Fletcher - Error Handling in Concurrent Systems

Put Some SRE in Your Shipped Software

Metric Abuse: Frequently Misused Metrics in Oracle

Prometheus - Open Source Forum Japan

Reliability Patterns for Distributed Applications

Step by Step on How to Setup DarkComet

Ultimate Guide to Setup DarkComet with NoIP

1. In OS, multiple jobs can run in parallel and finish faster than i.pdf

Watching Somebody Else's Computer: Cloud Native Observability

SELJE - VFP and IT Security.pdf

How to Use FTP Files

Start Up Austin 2017: Manual vs Automation - When to Start Automating your Pr...

Mehr von Icinga

Upgrading Incident Management with Icinga - Icinga Camp Milan 2023Icinga

Extending Icinga Web with Modules: powerful, smart and easily created - Icing...Icinga

Infrastructure Monitoring for Cloud Native Enterprises - Icinga Camp Milan 2023Icinga

Incident management: Best industry practices your team should know - Icinga C...Icinga

Monitoring Cooling Units in a pharmaceutical GxP regulated environment - Icin...Icinga

SNMP Monitoring at scale - Icinga Camp Milan 2023Icinga

Monitoring Kubernetes with Icinga - Icinga Camp Milan 2023Icinga

Current State of Icinga - Icinga Camp Milan 2023Icinga

Efficient IT operations using monitoring systems and standardized tools - Ici...Icinga

Tornado Complex Event Processing Framework for Icinga - Icinga Camp Zurich 2019Icinga

Signalilo: Visualizing Prometheus alerts in Icinga2 - Icinga Camp Zurich 2019Icinga

Moving from Icinga 1 to Icinga 2 + Director - Icinga Camp Zurich 2019Icinga

Icinga Director and vSphereDB - how they play together - Icinga Camp Zurich 2019Icinga

Current State of Icinga - Icinga Camp Zurich 2019Icinga

NetEye 4 based on Icinga 2 - Icinga Camp Milan 2019Icinga

Integrating Icinga 2 and ntopng - Icinga Camp Milan 2019Icinga

DevOps monitoring: Best Practices using OpenShift combined with Icinga & Big ...Icinga

Current State of Icinga - Icinga Camp Milan 2019Icinga

Best of Icinga Modules - Icinga Camp Milan 2019Icinga

hallenges of Monitoring Big Infrastructure - Icinga Camp Milan 2019Icinga

Mehr von Icinga (20)

Upgrading Incident Management with Icinga - Icinga Camp Milan 2023

Extending Icinga Web with Modules: powerful, smart and easily created - Icing...

Infrastructure Monitoring for Cloud Native Enterprises - Icinga Camp Milan 2023

Incident management: Best industry practices your team should know - Icinga C...

Monitoring Cooling Units in a pharmaceutical GxP regulated environment - Icin...

SNMP Monitoring at scale - Icinga Camp Milan 2023

Monitoring Kubernetes with Icinga - Icinga Camp Milan 2023

Current State of Icinga - Icinga Camp Milan 2023

Efficient IT operations using monitoring systems and standardized tools - Ici...

Tornado Complex Event Processing Framework for Icinga - Icinga Camp Zurich 2019

Signalilo: Visualizing Prometheus alerts in Icinga2 - Icinga Camp Zurich 2019

Moving from Icinga 1 to Icinga 2 + Director - Icinga Camp Zurich 2019

Icinga Director and vSphereDB - how they play together - Icinga Camp Zurich 2019

Current State of Icinga - Icinga Camp Zurich 2019

NetEye 4 based on Icinga 2 - Icinga Camp Milan 2019

Integrating Icinga 2 and ntopng - Icinga Camp Milan 2019

DevOps monitoring: Best Practices using OpenShift combined with Icinga & Big ...

Current State of Icinga - Icinga Camp Milan 2019

Best of Icinga Modules - Icinga Camp Milan 2019

hallenges of Monitoring Big Infrastructure - Icinga Camp Milan 2019

Kürzlich hochgeladen

[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal

UiPath Community: Communication Mining from Zero to HeroUiPathCommunity

Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

Time Series Foundation Models - current state and future directionsNathaniel Shimoni

What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA

Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani

Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González

Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen

Sample pptx for embedding into website for demoHarshalMandlekar2

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda

Decarbonising Buildings: Making a net-zero built environment a realityIES VE

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

Kürzlich hochgeladen (20)

[Webinar] SpiraTest - Setting New Standards in Quality Assurance

TeamStation AI System Report LATAM IT Salaries 2024

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...

UiPath Community: Communication Mining from Zero to Hero

Genislab builds better products and faster go-to-market with Lean project man...

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

Time Series Foundation Models - current state and future directions

What is DBT - The Ultimate Data Build Tool.pdf

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

Long journey of Ruby standard library at RubyConf AU 2024

Potential of AI (Generative AI) in Business: Learnings and Insights

Generative Artificial Intelligence: How generative AI works.pdf

Testing tools and AI - ideas what to try with some tool examples

Sample pptx for embedding into website for demo

Generative AI for Technical Writer or Information Developers

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger

Decarbonising Buildings: Making a net-zero built environment a reality

How AI, OpenAI, and ChatGPT impact business and software.

IcingaCamp Stockholm - How to make your monitoring shut up

1. How to make your monitoring shut up Observations on what to monitor and when to alert

2. Hello! @moegyver@EttLejon (tho I hardly use twitter…) (easier to find me as lejonet @Freenode/OFTC IRC)

3. What is your name? Ops/DevOps What is your quest? A healthy and functioning server environment What is… your favourite time to be woken up by an alert? Erh.. Never.. Erh office hou aaaaaaaaaah!

4. Example Checklist for monitoring vs alerting 1. What am I actually monitoring, service or details of part of a service (hardware, connectivity, etc)? If the answer isn’t a service, then it might not be fit for alerting 2. Will I get fired for not responding directly to this? (aka is it business critical) If the answer to that is yes, maybe its fit for alerting 3. Is what I want to check possibly covered by another alert or monitoring check? E.g. connectivity is also answered by a check for if / in the webroot can be requested, in combination with answering if the webserver is properly configured

5. And for the love of all that is compiling, keep all the monitoring and alerting in _ONE_ system! (if you need/want to use more than one, atleast aggregate the end result to _ONE_ system)

6. Arguing about how to monitor/alert or with what takes away from the fact that you’re not that interested in that the monitoring is done but what information it gives you. AKA Even someone pressing F5 in their browser to verify that the site is up is more informative than: Server has 1337 MB RAM free, / is 42% free and there is 1 (management ;) ) zombie process

7. Alerting

8. alert(“a service wants attention”)

9. TMI! DISK CRITICAL - free space: / 7002 MB (9% inode 60%): /data 16273093 MB (26% inode 99%)

10. ??? DISK CRITICAL - free space: / down to 9%

11. ✓ DISK CRITICAL - root will be out of space in 2 h

12. Define sane SLOs

13.

14.

15.

16.

17. Always fix your alerts!

18.