SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Migrating From Monitoring to Observability
Senior Solutions Architect
13 Years of Experience in IT
4 Years at Mobius Partners
Live in San Antonio with my wife and 6 daughters
Yes, 6 Daughters
Sports Nut and Superhero Nerd and a Meme Hoarder
Craig Haessig - Click here for LinkedIn or Craigh@mobiuspartners.com.
About me
Team in Operations
1 – 10 team members
A variety of tools that no one really likes
Very reactive
Added after and outage
Siloed
Rarely helps find root cause
Monitoring – The Old Way
A New Tool is Not the Answer
There is no Magic Pill
It requires a change in Culture
It requires a change in Processes
It requires a change in Philosophy
What is Observability?
It is not a new term
It comes from System Control Theory
Observability Definition from Control Theory
A measure of how well the internal states of a system can be inferred
from knowledge of its external outputs
What is Observability?
Huh?
Being able to understand a systems inner working and state by
measuring its external behaviors
A measure of how well we can understand a system from the work
it does
Its not a new word for monitoring and doesn’t replace monitoring
Observability provides deeper insights to help you find the WHY
A “digital exhaust”
What is Observability?
An observable system is one that exposes enough data about itself
so that generating information (finding answers to questions yet to
be formulated) and easily accessing this information becomes
simple. – Cindy Sridharan
What is Observability?
A Culture of Observability will be a more affective than any tool
Tools will not magically “give you” observability
How much does your company value the ability to inspect and
understand your systems, workloads and behavior?
Culture of Observability
Observability
Monitoring Observability
Tells you whether the system works Lets you ask why It is not working
A collection of metrics and logs about a
system
The dissemination of information from the
system
Failure Centric Understand system behavior
Is “the how”/Something you do Is “the goal” / Something you have
I monitor you You make yourself observable
Monitoring vs Observability
Black Box White Box
Monitoring from the Outside Monitoring from the Inside
Polling, Uptime, pings, etc Metrics, Logs and Traces
Status from 3rd Party Systems you rely on Systems you own and can instrument
Still Important Critical Source of Data for Observability
Types of Monitoring
Logs
Metrics
Traces
The Three Pillars of Observability
Logs – A record of discrete events that happened over time
Plaintext – Most common
Structure – JSON – Name/Value pair
Collecting and storing these can be expensive but valuable
Pillar 1 – Logs
Provides insights into what is happening in a system but you need
context.
Use or Build a Logging Standards for your systems
Write out logs that are useful and clear
Store and aggregate your logs – Many tools out there to do this
Overtime reduce what you don’t need.
Logs – Becoming More Observable
Log Analytics tools can help you provide context
Able to search across multiple systems in near real-time
Able to look at what happen in the past and find root cause
Create trending reports
Gain insights and learn over time how your systems behave
Single source for many types of data from multiple systems
Log Analytics
Pillar 2 - Metrics
Metrics – a set of numbers that give information about a particular
process of activity
Numeric representation of your data in a time series format
Can be leveraged against mathematical modeling and prediction
the deliver knowledge of the behavior of your systems. – Math is
FUN!
Pillar 2 - Metrics
Logs can be used to give you metrics. Example: Counting the
number of error codes over a period of time to give you a metric.
Overhead of Metrics generation and stores is consistent. Logs
collection can very compared to Metrics.
Apply labels to give contexts of the data.
Metrics
Instrument your code to collect application metrics
System metrics are not enough
Push Developers to identify the metrics we need to monitor the
systems
Lots of great libraries and tools out there to help
Don’t be afraid of collecting too much
Visualize your data – Build Beautiful Graphs
Metrics – Become More Observable
Traces – a representation of a service of events that encode the
end-to-end request flow through a distributed system
Gives insights into how services interact with other services
Can see what parts of the system are performing well or poorly
Helps to identify bottlenecks
Pillar 3 - Traces
Identify areas where you feel tracing could be beneficial
Use sampling
Be patient
Work with developers to identify how to best instrument your
codebase to start tracing
Tracing – Becoming more Observable
Alerting
Alert fatigue is real
Engineers become numb to noisy or false alerts
Alert on things that require action
Perform automation to remedy before alerting
Alert should tell you what is wrong and why
Better Alerting
Utilization – The average time that the resource was busy servicing
work – Memory Utilization
Saturation – the degree to which the resource has extra work
which it cant service, often queued – CPU Run Queue Length
Errors – The count of error events
Use the USE Method
Request Rate
Error Rate
Duration of Request
RED Method
Identify what your systems report
Alert when end users and customers are experiencing problems
Make this data readily available
Alert on 3 – 10 metric
Keep it simple
Create your own method
1. Don’t try to boil the ocean
2. Add monitoring to developers responsibility
• Those who built know what to monitor
3. View from a Service/Application POV
4. Collect data
5. Alert on only actionable events
6. Don’t forget about the business – Track Business Metrics
Building an Observable Culture
Monitoring is not dead
Monitoring needs to move up the stack
Developers need to own and help instrument their code
Collect all the data
Alert smarter
Observability is not just a buzz word its a Culture
Conclusion
Machine Learning to help identify issues earlier and identify trends
More Tools, More Data and More Confusion
Balancing Monolithic and Micro services and Serverless
More Responsibilities with ess Resources
Leverage Automation!
Transform your Culture
Future of Monitoring and Observability
• Monitoring in the Time of Cloud Native – Cindy Sridharan
Monitoring and Observability – Cindy Sridharan
3 Pillars of Observability - Cengiz Han
Monitoring Isn’t Observability – Baron Schwartz
Beginners Guide to Observability – Splunk.com
Observability and Instrumentation: what they are and why they matter – Fredric Paul –
New Relic Blog
Monitoring and Observability - Ernest Mueller
Use Method
Sources and References

Weitere ähnliche Inhalte

Was ist angesagt?

Observability vs APM vs Monitoring Comparison
Observability vs APM vs  Monitoring ComparisonObservability vs APM vs  Monitoring Comparison
Observability vs APM vs Monitoring Comparisonjeetendra mandal
 
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...DevOps.com
 
Introduction to Open Telemetry as Observability Library
Introduction to Open  Telemetry as Observability LibraryIntroduction to Open  Telemetry as Observability Library
Introduction to Open Telemetry as Observability LibraryTonny Adhi Sabastian
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack PresentationAmr Alaa Yassen
 
Clean architectures with fast api pycones
Clean architectures with fast api   pyconesClean architectures with fast api   pycones
Clean architectures with fast api pyconesAlvaro Del Castillo
 
Logging and observability
Logging and observabilityLogging and observability
Logging and observabilityAnton Drukh
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in NetflixDanny Yuan
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersSATOSHI TAGOMORI
 
Observability For Modern Applications
Observability For Modern ApplicationsObservability For Modern Applications
Observability For Modern ApplicationsAmazon Web Services
 
Api observability
Api observability Api observability
Api observability Red Hat
 
Principles of System Observability
Principles of System Observability Principles of System Observability
Principles of System Observability Janis Orlovs
 
Distributed Tracing in Practice
Distributed Tracing in PracticeDistributed Tracing in Practice
Distributed Tracing in PracticeDevOps.com
 
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioTHE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioDevOpsDays Tel Aviv
 
Azure API Management
Azure API ManagementAzure API Management
Azure API ManagementDaniel Toomey
 
OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For DevelopersKevin Brockhoff
 

Was ist angesagt? (20)

Observability
ObservabilityObservability
Observability
 
Observability
ObservabilityObservability
Observability
 
Observability vs APM vs Monitoring Comparison
Observability vs APM vs  Monitoring ComparisonObservability vs APM vs  Monitoring Comparison
Observability vs APM vs Monitoring Comparison
 
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
 
Observability
ObservabilityObservability
Observability
 
Introduction to Open Telemetry as Observability Library
Introduction to Open  Telemetry as Observability LibraryIntroduction to Open  Telemetry as Observability Library
Introduction to Open Telemetry as Observability Library
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
 
Clean architectures with fast api pycones
Clean architectures with fast api   pyconesClean architectures with fast api   pycones
Clean architectures with fast api pycones
 
Logging and observability
Logging and observabilityLogging and observability
Logging and observability
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
Observability For Modern Applications
Observability For Modern ApplicationsObservability For Modern Applications
Observability For Modern Applications
 
Observability driven development
Observability driven developmentObservability driven development
Observability driven development
 
Api observability
Api observability Api observability
Api observability
 
Principles of System Observability
Principles of System Observability Principles of System Observability
Principles of System Observability
 
Distributed Tracing in Practice
Distributed Tracing in PracticeDistributed Tracing in Practice
Distributed Tracing in Practice
 
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioTHE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
 
AWSからのメール送信
AWSからのメール送信AWSからのメール送信
AWSからのメール送信
 
Azure API Management
Azure API ManagementAzure API Management
Azure API Management
 
OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For Developers
 

Ähnlich wie Migrating Monitoring to Observability – How to Transform DevOps from being Reactive to Proactive

Building Information System
Building Information SystemBuilding Information System
Building Information SystemRabia Jabeen
 
Monitoring Distributed Systems
Monitoring Distributed SystemsMonitoring Distributed Systems
Monitoring Distributed SystemsAleksandr Tavgen
 
beginners-guide-to-observability.pdf
beginners-guide-to-observability.pdfbeginners-guide-to-observability.pdf
beginners-guide-to-observability.pdfValerioArvizzigno1
 
Data Analytics Introduction.pptx
Data Analytics Introduction.pptxData Analytics Introduction.pptx
Data Analytics Introduction.pptxamitparashar42
 
Data Analytics Introduction.pptx
Data Analytics Introduction.pptxData Analytics Introduction.pptx
Data Analytics Introduction.pptxamitparashar42
 
Sad 201 project sparc vision online library-assignment 2
Sad 201  project sparc vision  online library-assignment 2Sad 201  project sparc vision  online library-assignment 2
Sad 201 project sparc vision online library-assignment 2Justin Chinkolenji
 
Full Docu IT Thesis Project In Computerized Inventory System In Brother Burg...
Full Docu IT Thesis Project In Computerized Inventory System In Brother  Burg...Full Docu IT Thesis Project In Computerized Inventory System In Brother  Burg...
Full Docu IT Thesis Project In Computerized Inventory System In Brother Burg...JON ICK BOGUAT
 
System Analysis Fact Finding Methods
System Analysis Fact Finding MethodsSystem Analysis Fact Finding Methods
System Analysis Fact Finding MethodsMoshikur Rahman
 
Lo3=p4, p5, m2, d2
Lo3=p4, p5, m2, d2Lo3=p4, p5, m2, d2
Lo3=p4, p5, m2, d2sparkeyrob
 
Requirements Engineering Processes in Software Engineering SE6
Requirements Engineering Processes in Software Engineering SE6Requirements Engineering Processes in Software Engineering SE6
Requirements Engineering Processes in Software Engineering SE6koolkampus
 
How To Elminate Errors and Increase Efficiency
How To Elminate Errors and Increase EfficiencyHow To Elminate Errors and Increase Efficiency
How To Elminate Errors and Increase EfficiencySmartDraw Software
 
ERP and related technology
ERP and related technology ERP and related technology
ERP and related technology Usman Tariq
 
The Tableau Experience Kaunas - TOC Sales and Marketing prezentacija
The Tableau Experience Kaunas - TOC Sales and Marketing prezentacijaThe Tableau Experience Kaunas - TOC Sales and Marketing prezentacija
The Tableau Experience Kaunas - TOC Sales and Marketing prezentacijaBaltimax
 
System and design chapter-2
System and design chapter-2System and design chapter-2
System and design chapter-2Best Rahim
 
CS 414 (IT Project Management)
CS 414 (IT Project Management)CS 414 (IT Project Management)
CS 414 (IT Project Management)raszky
 
Data Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyData Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyAnkita Dubey
 
System Modeling & Simulation Introduction
System Modeling & Simulation  IntroductionSystem Modeling & Simulation  Introduction
System Modeling & Simulation IntroductionSharmilaChidaravalli
 
Voice of the customer requirements overview
Voice of the customer requirements overviewVoice of the customer requirements overview
Voice of the customer requirements overviewMichael Dattilio
 

Ähnlich wie Migrating Monitoring to Observability – How to Transform DevOps from being Reactive to Proactive (20)

Building Information System
Building Information SystemBuilding Information System
Building Information System
 
Monitoring Distributed Systems
Monitoring Distributed SystemsMonitoring Distributed Systems
Monitoring Distributed Systems
 
beginners-guide-to-observability.pdf
beginners-guide-to-observability.pdfbeginners-guide-to-observability.pdf
beginners-guide-to-observability.pdf
 
Sad Lec3
Sad Lec3Sad Lec3
Sad Lec3
 
Data Analytics Introduction.pptx
Data Analytics Introduction.pptxData Analytics Introduction.pptx
Data Analytics Introduction.pptx
 
Data Analytics Introduction.pptx
Data Analytics Introduction.pptxData Analytics Introduction.pptx
Data Analytics Introduction.pptx
 
Sad 201 project sparc vision online library-assignment 2
Sad 201  project sparc vision  online library-assignment 2Sad 201  project sparc vision  online library-assignment 2
Sad 201 project sparc vision online library-assignment 2
 
Full Docu IT Thesis Project In Computerized Inventory System In Brother Burg...
Full Docu IT Thesis Project In Computerized Inventory System In Brother  Burg...Full Docu IT Thesis Project In Computerized Inventory System In Brother  Burg...
Full Docu IT Thesis Project In Computerized Inventory System In Brother Burg...
 
System Analysis Fact Finding Methods
System Analysis Fact Finding MethodsSystem Analysis Fact Finding Methods
System Analysis Fact Finding Methods
 
Lo3=p4, p5, m2, d2
Lo3=p4, p5, m2, d2Lo3=p4, p5, m2, d2
Lo3=p4, p5, m2, d2
 
Requirements Engineering Processes in Software Engineering SE6
Requirements Engineering Processes in Software Engineering SE6Requirements Engineering Processes in Software Engineering SE6
Requirements Engineering Processes in Software Engineering SE6
 
How To Elminate Errors and Increase Efficiency
How To Elminate Errors and Increase EfficiencyHow To Elminate Errors and Increase Efficiency
How To Elminate Errors and Increase Efficiency
 
ERP and related technology
ERP and related technology ERP and related technology
ERP and related technology
 
Ms 04
Ms 04Ms 04
Ms 04
 
The Tableau Experience Kaunas - TOC Sales and Marketing prezentacija
The Tableau Experience Kaunas - TOC Sales and Marketing prezentacijaThe Tableau Experience Kaunas - TOC Sales and Marketing prezentacija
The Tableau Experience Kaunas - TOC Sales and Marketing prezentacija
 
System and design chapter-2
System and design chapter-2System and design chapter-2
System and design chapter-2
 
CS 414 (IT Project Management)
CS 414 (IT Project Management)CS 414 (IT Project Management)
CS 414 (IT Project Management)
 
Data Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyData Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubey
 
System Modeling & Simulation Introduction
System Modeling & Simulation  IntroductionSystem Modeling & Simulation  Introduction
System Modeling & Simulation Introduction
 
Voice of the customer requirements overview
Voice of the customer requirements overviewVoice of the customer requirements overview
Voice of the customer requirements overview
 

Kürzlich hochgeladen

Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 

Kürzlich hochgeladen (20)

Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

Migrating Monitoring to Observability – How to Transform DevOps from being Reactive to Proactive

  • 1. Migrating From Monitoring to Observability
  • 2. Senior Solutions Architect 13 Years of Experience in IT 4 Years at Mobius Partners Live in San Antonio with my wife and 6 daughters Yes, 6 Daughters Sports Nut and Superhero Nerd and a Meme Hoarder Craig Haessig - Click here for LinkedIn or Craigh@mobiuspartners.com. About me
  • 3. Team in Operations 1 – 10 team members A variety of tools that no one really likes Very reactive Added after and outage Siloed Rarely helps find root cause Monitoring – The Old Way
  • 4. A New Tool is Not the Answer There is no Magic Pill It requires a change in Culture It requires a change in Processes It requires a change in Philosophy
  • 6. It is not a new term It comes from System Control Theory Observability Definition from Control Theory A measure of how well the internal states of a system can be inferred from knowledge of its external outputs What is Observability?
  • 8. Being able to understand a systems inner working and state by measuring its external behaviors A measure of how well we can understand a system from the work it does Its not a new word for monitoring and doesn’t replace monitoring Observability provides deeper insights to help you find the WHY A “digital exhaust” What is Observability?
  • 9. An observable system is one that exposes enough data about itself so that generating information (finding answers to questions yet to be formulated) and easily accessing this information becomes simple. – Cindy Sridharan What is Observability?
  • 10. A Culture of Observability will be a more affective than any tool Tools will not magically “give you” observability How much does your company value the ability to inspect and understand your systems, workloads and behavior? Culture of Observability
  • 12. Monitoring Observability Tells you whether the system works Lets you ask why It is not working A collection of metrics and logs about a system The dissemination of information from the system Failure Centric Understand system behavior Is “the how”/Something you do Is “the goal” / Something you have I monitor you You make yourself observable Monitoring vs Observability
  • 13. Black Box White Box Monitoring from the Outside Monitoring from the Inside Polling, Uptime, pings, etc Metrics, Logs and Traces Status from 3rd Party Systems you rely on Systems you own and can instrument Still Important Critical Source of Data for Observability Types of Monitoring
  • 15. Logs – A record of discrete events that happened over time Plaintext – Most common Structure – JSON – Name/Value pair Collecting and storing these can be expensive but valuable Pillar 1 – Logs
  • 16. Provides insights into what is happening in a system but you need context. Use or Build a Logging Standards for your systems Write out logs that are useful and clear Store and aggregate your logs – Many tools out there to do this Overtime reduce what you don’t need. Logs – Becoming More Observable
  • 17. Log Analytics tools can help you provide context Able to search across multiple systems in near real-time Able to look at what happen in the past and find root cause Create trending reports Gain insights and learn over time how your systems behave Single source for many types of data from multiple systems Log Analytics
  • 18. Pillar 2 - Metrics
  • 19. Metrics – a set of numbers that give information about a particular process of activity Numeric representation of your data in a time series format Can be leveraged against mathematical modeling and prediction the deliver knowledge of the behavior of your systems. – Math is FUN! Pillar 2 - Metrics
  • 20. Logs can be used to give you metrics. Example: Counting the number of error codes over a period of time to give you a metric. Overhead of Metrics generation and stores is consistent. Logs collection can very compared to Metrics. Apply labels to give contexts of the data. Metrics
  • 21. Instrument your code to collect application metrics System metrics are not enough Push Developers to identify the metrics we need to monitor the systems Lots of great libraries and tools out there to help Don’t be afraid of collecting too much Visualize your data – Build Beautiful Graphs Metrics – Become More Observable
  • 22. Traces – a representation of a service of events that encode the end-to-end request flow through a distributed system Gives insights into how services interact with other services Can see what parts of the system are performing well or poorly Helps to identify bottlenecks Pillar 3 - Traces
  • 23. Identify areas where you feel tracing could be beneficial Use sampling Be patient Work with developers to identify how to best instrument your codebase to start tracing Tracing – Becoming more Observable
  • 25. Alert fatigue is real Engineers become numb to noisy or false alerts Alert on things that require action Perform automation to remedy before alerting Alert should tell you what is wrong and why Better Alerting
  • 26. Utilization – The average time that the resource was busy servicing work – Memory Utilization Saturation – the degree to which the resource has extra work which it cant service, often queued – CPU Run Queue Length Errors – The count of error events Use the USE Method
  • 27. Request Rate Error Rate Duration of Request RED Method
  • 28. Identify what your systems report Alert when end users and customers are experiencing problems Make this data readily available Alert on 3 – 10 metric Keep it simple Create your own method
  • 29. 1. Don’t try to boil the ocean 2. Add monitoring to developers responsibility • Those who built know what to monitor 3. View from a Service/Application POV 4. Collect data 5. Alert on only actionable events 6. Don’t forget about the business – Track Business Metrics Building an Observable Culture
  • 30. Monitoring is not dead Monitoring needs to move up the stack Developers need to own and help instrument their code Collect all the data Alert smarter Observability is not just a buzz word its a Culture Conclusion
  • 31. Machine Learning to help identify issues earlier and identify trends More Tools, More Data and More Confusion Balancing Monolithic and Micro services and Serverless More Responsibilities with ess Resources Leverage Automation! Transform your Culture Future of Monitoring and Observability
  • 32. • Monitoring in the Time of Cloud Native – Cindy Sridharan Monitoring and Observability – Cindy Sridharan 3 Pillars of Observability - Cengiz Han Monitoring Isn’t Observability – Baron Schwartz Beginners Guide to Observability – Splunk.com Observability and Instrumentation: what they are and why they matter – Fredric Paul – New Relic Blog Monitoring and Observability - Ernest Mueller Use Method Sources and References

Hinweis der Redaktion

  1. Observability is not just failure centric, Used for debugging and normal usage, not just when something is perceived to be broken
  2. Share Bad Logging Experience
  3. Enriching your data
  4. Labels, tags! Very important Lot ot tools – Collectd, statsd, promethius, and more Time series database
  5. Developers add logging, need to also add metrics to their code Promethius aggregates before it sends it to Time Series
  6. Look up notes
  7. KISS method
  8. Observability is a Culture
  9. On prem, cloud, hybrid, Contianers, VMs, etc. Its noting getting simpler to monitor