SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Beyond Nagios


      NYC DevOps 2011/07/21
Alexis Lê-Quôc - alq@datadoghq.com
Beyond Nagios


      NYC DevOps 2011/07/21
Alexis Lê-Quôc - alq@datadoghq.com
What I’m Going To Talk About

    • Super-quick   Nagios summary

    • Monitoring/Alerting   Pathologies

    • How   to fix it
What Is

• “Industry   Standard in IT Infrastructure Monitoring”

  • For   once it’s true...

• Scheduler    & Notification server
(+) Robust, Mature code-base

(-) Configuration can be daunting

(-) Not human-friendly
“OVERWHELMING”
A “NORMAL” HOUR
THE “OTHER” NAGIOS UI
Process alerts
                  & Fix things




Receive alerts                    Add more checks




     THE HAPPY START
Missed alerts




Ignore Alerts                   Add more checks




 THE SPIRAL OF DEATH
Quality
      of life


Few checks
Few alerts




                 More checks
                 Too many alerts

                                   # of alerts
             FIGHT OR FLIGHT
Effective                                    Checks n^2
 Coverage                                     Fault-tolerant
                                              Less urgency

Few checks
Few alerts
Every host counts




                    More checks
                    Too many alerts
                    Every host still counts             Scale
                                                    Complexity

    THE TROUGH OF DESPAIR
Effective
Coverage




                           Scale
    IF ONLY I ADDED MORE
           CHECKS...
Reset!
Way Out
‣Breathe!
‣Measure
‣Look for Patterns
‣Put Alerts in Context
‣Focus on the Business
Turn Nagios logs into structured data




                            Analyze


              day     | success_pct | warning_pct | error_pct | events
---------------------+-------------+-------------+-----------+--------
           2011-07-12 00:00:00 |       89 |       0|       2 | 9628
           2011-07-13 00:00:00 |       90 |       0|       2 | 9210
           2011-07-14 00:00:00 |       90 |       0|       2 | 9735
           2011-07-15 00:00:00 |       89 |       0|       2 | 9531




                    MEASURE
day     | success_pct | warning_pct | error_pct | events
---------------------+-------------+-------------+-----------+--------
           2011-07-12 00:00:00 |       89 |       0|       2 | 9628
           2011-07-13 00:00:00 |       90 |       0|       2 | 9210
           2011-07-14 00:00:00 |       90 |       0|       2 | 9735
           2011-07-15 00:00:00 |       89 |       0|       2 | 9531




VISUALIZATION MATTERS
In Time




      Flapping




LOOK FOR PATTERNS
PUT ALERTS IN CONTEXT
    https://app.datad0g.com/dash/dash/1000#/date_range/1310682467000.0-1310684267000.0
Ultimate (hard) question
‣Does this alert impact the business?
 ‣If so by how much?
 ‣Assumes that you track business metrics...
 ‣And they can be accessed programatically



FOCUS ON THE BUSINESS
What applies to Nagios...
Applies to other sources too




                       etc...
Thanks


http://datadoghq.com

Weitere ähnliche Inhalte

Ähnlich wie Beyond Nagios

Securing Systems - Still Crazy After All These Years
Securing Systems - Still Crazy After All These YearsSecuring Systems - Still Crazy After All These Years
Securing Systems - Still Crazy After All These YearsAdrian Sanabria
 
ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015Shannon Lietz
 
Rundeck Overview
Rundeck OverviewRundeck Overview
Rundeck OverviewRundeck
 
How to use Istio/Anthos to build Enterprise SRE
How to use Istio/Anthos to build Enterprise SREHow to use Istio/Anthos to build Enterprise SRE
How to use Istio/Anthos to build Enterprise SRETzung-Hsien (Shawn) Ho
 
Business Case Calculator for DevOps Initiatives - Leading credit card service...
Business Case Calculator for DevOps Initiatives - Leading credit card service...Business Case Calculator for DevOps Initiatives - Leading credit card service...
Business Case Calculator for DevOps Initiatives - Leading credit card service...Capgemini
 
Modern Monitoring [ with Prometheus ]
Modern Monitoring [ with Prometheus ]Modern Monitoring [ with Prometheus ]
Modern Monitoring [ with Prometheus ]Haggai Philip Zagury
 
An Introduction to ORYX Software
An Introduction to ORYX SoftwareAn Introduction to ORYX Software
An Introduction to ORYX SoftwareAccountagility
 
DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015Shannon Lietz
 
Information Security in the Gaming World
Information Security in the Gaming WorldInformation Security in the Gaming World
Information Security in the Gaming WorldDimitrios Stergiou
 
Quick wins in the NetOps Journey by Vincent Boon, Opengear
Quick wins in the NetOps Journey by Vincent Boon, OpengearQuick wins in the NetOps Journey by Vincent Boon, Opengear
Quick wins in the NetOps Journey by Vincent Boon, OpengearMyNOG
 
Ploigos - How It Works, and Why.pdf
Ploigos - How It Works, and Why.pdfPloigos - How It Works, and Why.pdf
Ploigos - How It Works, and Why.pdfBill Bensing
 
Achieving Compliance Through Security
Achieving Compliance Through SecurityAchieving Compliance Through Security
Achieving Compliance Through SecurityEnergySec
 
What does performance mean in the cloud
What does performance mean in the cloudWhat does performance mean in the cloud
What does performance mean in the cloudMichael Kopp
 
OSDC 2014: Fernando Hönig - New Data Center Service Model: Cloud + DevOps
OSDC 2014:  Fernando Hönig - New Data Center Service Model: Cloud + DevOpsOSDC 2014:  Fernando Hönig - New Data Center Service Model: Cloud + DevOps
OSDC 2014: Fernando Hönig - New Data Center Service Model: Cloud + DevOpsNETWAYS
 
45 Minutes to PCI Compliance in the Cloud
45 Minutes to PCI Compliance in the Cloud45 Minutes to PCI Compliance in the Cloud
45 Minutes to PCI Compliance in the CloudCloudPassage
 
Do You Really Need to Evolve From Monitoring to Observability?
Do You Really Need to Evolve From Monitoring to Observability?Do You Really Need to Evolve From Monitoring to Observability?
Do You Really Need to Evolve From Monitoring to Observability?Splunk
 
Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...
Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...
Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...Nagios
 

Ähnlich wie Beyond Nagios (20)

Securing Systems - Still Crazy After All These Years
Securing Systems - Still Crazy After All These YearsSecuring Systems - Still Crazy After All These Years
Securing Systems - Still Crazy After All These Years
 
ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015
 
Rundeck Overview
Rundeck OverviewRundeck Overview
Rundeck Overview
 
How to use Istio/Anthos to build Enterprise SRE
How to use Istio/Anthos to build Enterprise SREHow to use Istio/Anthos to build Enterprise SRE
How to use Istio/Anthos to build Enterprise SRE
 
Business Case Calculator for DevOps Initiatives - Leading credit card service...
Business Case Calculator for DevOps Initiatives - Leading credit card service...Business Case Calculator for DevOps Initiatives - Leading credit card service...
Business Case Calculator for DevOps Initiatives - Leading credit card service...
 
Modern Monitoring [ with Prometheus ]
Modern Monitoring [ with Prometheus ]Modern Monitoring [ with Prometheus ]
Modern Monitoring [ with Prometheus ]
 
An Introduction to ORYX Software
An Introduction to ORYX SoftwareAn Introduction to ORYX Software
An Introduction to ORYX Software
 
DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015
 
DevSecCon Keynote
DevSecCon KeynoteDevSecCon Keynote
DevSecCon Keynote
 
Information Security in the Gaming World
Information Security in the Gaming WorldInformation Security in the Gaming World
Information Security in the Gaming World
 
Q insure
Q insure Q insure
Q insure
 
Quick wins in the NetOps Journey by Vincent Boon, Opengear
Quick wins in the NetOps Journey by Vincent Boon, OpengearQuick wins in the NetOps Journey by Vincent Boon, Opengear
Quick wins in the NetOps Journey by Vincent Boon, Opengear
 
Ploigos - How It Works, and Why.pdf
Ploigos - How It Works, and Why.pdfPloigos - How It Works, and Why.pdf
Ploigos - How It Works, and Why.pdf
 
EN - Workload Module
EN - 	Workload ModuleEN - 	Workload Module
EN - Workload Module
 
Achieving Compliance Through Security
Achieving Compliance Through SecurityAchieving Compliance Through Security
Achieving Compliance Through Security
 
What does performance mean in the cloud
What does performance mean in the cloudWhat does performance mean in the cloud
What does performance mean in the cloud
 
OSDC 2014: Fernando Hönig - New Data Center Service Model: Cloud + DevOps
OSDC 2014:  Fernando Hönig - New Data Center Service Model: Cloud + DevOpsOSDC 2014:  Fernando Hönig - New Data Center Service Model: Cloud + DevOps
OSDC 2014: Fernando Hönig - New Data Center Service Model: Cloud + DevOps
 
45 Minutes to PCI Compliance in the Cloud
45 Minutes to PCI Compliance in the Cloud45 Minutes to PCI Compliance in the Cloud
45 Minutes to PCI Compliance in the Cloud
 
Do You Really Need to Evolve From Monitoring to Observability?
Do You Really Need to Evolve From Monitoring to Observability?Do You Really Need to Evolve From Monitoring to Observability?
Do You Really Need to Evolve From Monitoring to Observability?
 
Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...
Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...
Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...
 

Kürzlich hochgeladen

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Kürzlich hochgeladen (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Beyond Nagios

  • 1. Beyond Nagios NYC DevOps 2011/07/21 Alexis Lê-Quôc - alq@datadoghq.com
  • 2. Beyond Nagios NYC DevOps 2011/07/21 Alexis Lê-Quôc - alq@datadoghq.com
  • 3. What I’m Going To Talk About • Super-quick Nagios summary • Monitoring/Alerting Pathologies • How to fix it
  • 4. What Is • “Industry Standard in IT Infrastructure Monitoring” • For once it’s true... • Scheduler & Notification server
  • 5. (+) Robust, Mature code-base (-) Configuration can be daunting (-) Not human-friendly
  • 9. Process alerts & Fix things Receive alerts Add more checks THE HAPPY START
  • 10. Missed alerts Ignore Alerts Add more checks THE SPIRAL OF DEATH
  • 11. Quality of life Few checks Few alerts More checks Too many alerts # of alerts FIGHT OR FLIGHT
  • 12. Effective Checks n^2 Coverage Fault-tolerant Less urgency Few checks Few alerts Every host counts More checks Too many alerts Every host still counts Scale Complexity THE TROUGH OF DESPAIR
  • 13. Effective Coverage Scale IF ONLY I ADDED MORE CHECKS...
  • 15. Way Out ‣Breathe! ‣Measure ‣Look for Patterns ‣Put Alerts in Context ‣Focus on the Business
  • 16. Turn Nagios logs into structured data Analyze day | success_pct | warning_pct | error_pct | events ---------------------+-------------+-------------+-----------+-------- 2011-07-12 00:00:00 | 89 | 0| 2 | 9628 2011-07-13 00:00:00 | 90 | 0| 2 | 9210 2011-07-14 00:00:00 | 90 | 0| 2 | 9735 2011-07-15 00:00:00 | 89 | 0| 2 | 9531 MEASURE
  • 17. day | success_pct | warning_pct | error_pct | events ---------------------+-------------+-------------+-----------+-------- 2011-07-12 00:00:00 | 89 | 0| 2 | 9628 2011-07-13 00:00:00 | 90 | 0| 2 | 9210 2011-07-14 00:00:00 | 90 | 0| 2 | 9735 2011-07-15 00:00:00 | 89 | 0| 2 | 9531 VISUALIZATION MATTERS
  • 18. In Time Flapping LOOK FOR PATTERNS
  • 19. PUT ALERTS IN CONTEXT https://app.datad0g.com/dash/dash/1000#/date_range/1310682467000.0-1310684267000.0
  • 20. Ultimate (hard) question ‣Does this alert impact the business? ‣If so by how much? ‣Assumes that you track business metrics... ‣And they can be accessed programatically FOCUS ON THE BUSINESS
  • 21. What applies to Nagios... Applies to other sources too etc...

Hinweis der Redaktion

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n