Migrating Monitoring to Observability – How to Transform DevOps from being Reactive to Proactive

Migrating From Monitoring to Observability

Senior Solutions Architect
13 Years of Experience in IT
4 Years at Mobius Partners
Live in San Antonio with my wife and 6 daughters
Yes, 6 Daughters
Sports Nut and Superhero Nerd and a Meme Hoarder
Craig Haessig - Click here for LinkedIn or Craigh@mobiuspartners.com.
About me

Team in Operations
1 – 10 team members
A variety of tools that no one really likes
Very reactive
Added after and outage
Siloed
Rarely helps find root cause
Monitoring – The Old Way

A New Tool is Not the Answer
There is no Magic Pill
It requires a change in Culture
It requires a change in Processes
It requires a change in Philosophy

It is not a new term
It comes from System Control Theory
Observability Definition from Control Theory
A measure of how well the internal states of a system can be inferred
from knowledge of its external outputs
What is Observability?

Being able to understand a systems inner working and state by
measuring its external behaviors
A measure of how well we can understand a system from the work
it does
Its not a new word for monitoring and doesn’t replace monitoring
Observability provides deeper insights to help you find the WHY
A “digital exhaust”

An observable system is one that exposes enough data about itself
so that generating information (finding answers to questions yet to
be formulated) and easily accessing this information becomes
simple. – Cindy Sridharan

A Culture of Observability will be a more affective than any tool
Tools will not magically “give you” observability
How much does your company value the ability to inspect and
understand your systems, workloads and behavior?
Culture of Observability

Monitoring Observability
Tells you whether the system works Lets you ask why It is not working
A collection of metrics and logs about a
system
The dissemination of information from the
system
Failure Centric Understand system behavior
Is “the how”/Something you do Is “the goal” / Something you have
I monitor you You make yourself observable
Monitoring vs Observability

Black Box White Box
Monitoring from the Outside Monitoring from the Inside
Polling, Uptime, pings, etc Metrics, Logs and Traces
Status from 3rd Party Systems you rely on Systems you own and can instrument
Still Important Critical Source of Data for Observability
Types of Monitoring

Logs
Metrics
Traces
The Three Pillars of Observability

Logs – A record of discrete events that happened over time
Plaintext – Most common
Structure – JSON – Name/Value pair
Collecting and storing these can be expensive but valuable
Pillar 1 – Logs

Provides insights into what is happening in a system but you need
context.
Use or Build a Logging Standards for your systems
Write out logs that are useful and clear
Store and aggregate your logs – Many tools out there to do this
Overtime reduce what you don’t need.
Logs – Becoming More Observable

Log Analytics tools can help you provide context
Able to search across multiple systems in near real-time
Able to look at what happen in the past and find root cause
Create trending reports
Gain insights and learn over time how your systems behave
Single source for many types of data from multiple systems
Log Analytics

Metrics – a set of numbers that give information about a particular
process of activity
Numeric representation of your data in a time series format
Can be leveraged against mathematical modeling and prediction
the deliver knowledge of the behavior of your systems. – Math is
FUN!
Pillar 2 - Metrics

Logs can be used to give you metrics. Example: Counting the
number of error codes over a period of time to give you a metric.
Overhead of Metrics generation and stores is consistent. Logs
collection can very compared to Metrics.
Apply labels to give contexts of the data.
Metrics

Instrument your code to collect application metrics
System metrics are not enough
Push Developers to identify the metrics we need to monitor the
systems
Lots of great libraries and tools out there to help
Don’t be afraid of collecting too much
Visualize your data – Build Beautiful Graphs
Metrics – Become More Observable

Traces – a representation of a service of events that encode the
end-to-end request flow through a distributed system
Gives insights into how services interact with other services
Can see what parts of the system are performing well or poorly
Helps to identify bottlenecks
Pillar 3 - Traces

Identify areas where you feel tracing could be beneficial
Use sampling
Be patient
Work with developers to identify how to best instrument your
codebase to start tracing
Tracing – Becoming more Observable

Alert fatigue is real
Engineers become numb to noisy or false alerts
Alert on things that require action
Perform automation to remedy before alerting
Alert should tell you what is wrong and why
Better Alerting

Utilization – The average time that the resource was busy servicing
work – Memory Utilization
Saturation – the degree to which the resource has extra work
which it cant service, often queued – CPU Run Queue Length
Errors – The count of error events
Use the USE Method

Request Rate
Error Rate
Duration of Request
RED Method

Identify what your systems report
Alert when end users and customers are experiencing problems
Make this data readily available
Alert on 3 – 10 metric
Keep it simple
Create your own method

1. Don’t try to boil the ocean
2. Add monitoring to developers responsibility
• Those who built know what to monitor
3. View from a Service/Application POV
4. Collect data
5. Alert on only actionable events
6. Don’t forget about the business – Track Business Metrics
Building an Observable Culture

Monitoring is not dead
Monitoring needs to move up the stack
Developers need to own and help instrument their code
Collect all the data
Alert smarter
Observability is not just a buzz word its a Culture
Conclusion

Machine Learning to help identify issues earlier and identify trends
More Tools, More Data and More Confusion
Balancing Monolithic and Micro services and Serverless
More Responsibilities with ess Resources
Leverage Automation!
Transform your Culture
Future of Monitoring and Observability

• Monitoring in the Time of Cloud Native – Cindy Sridharan
Monitoring and Observability – Cindy Sridharan
3 Pillars of Observability - Cengiz Han
Monitoring Isn’t Observability – Baron Schwartz
Beginners Guide to Observability – Splunk.com
Observability and Instrumentation: what they are and why they matter – Fredric Paul –
New Relic Blog
Monitoring and Observability - Ernest Mueller
Use Method
Sources and References

Migrating Monitoring to Observability – How to Transform DevOps from being Reactive to Proactive

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Migrating Monitoring to Observability – How to Transform DevOps from being Reactive to Proactive

Ähnlich wie Migrating Monitoring to Observability – How to Transform DevOps from being Reactive to Proactive (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Migrating Monitoring to Observability – How to Transform DevOps from being Reactive to Proactive

Hinweis der Redaktion