With your Digital Transformation in full swing it’s time to transform the way you look at your systems and services. With the speed of DevOps you need your Monitoring to be faster, more agile, and more accurate. You can’t afford your systems to be down. Its time to look at monitoring from a different angle. Let’s explore looking from the top down rather than the bottom up. For more information, please reach out to Craig Haessig. CraigH@mobiuspartners.com
2. Senior Solutions Architect
13 Years of Experience in IT
4 Years at Mobius Partners
Live in San Antonio with my wife and 6 daughters
Yes, 6 Daughters
Sports Nut and Superhero Nerd and a Meme Hoarder
Craig Haessig - Click here for LinkedIn or Craigh@mobiuspartners.com.
About me
3. Team in Operations
1 – 10 team members
A variety of tools that no one really likes
Very reactive
Added after and outage
Siloed
Rarely helps find root cause
Monitoring – The Old Way
4. A New Tool is Not the Answer
There is no Magic Pill
It requires a change in Culture
It requires a change in Processes
It requires a change in Philosophy
6. It is not a new term
It comes from System Control Theory
Observability Definition from Control Theory
A measure of how well the internal states of a system can be inferred
from knowledge of its external outputs
What is Observability?
8. Being able to understand a systems inner working and state by
measuring its external behaviors
A measure of how well we can understand a system from the work
it does
Its not a new word for monitoring and doesn’t replace monitoring
Observability provides deeper insights to help you find the WHY
A “digital exhaust”
What is Observability?
9. An observable system is one that exposes enough data about itself
so that generating information (finding answers to questions yet to
be formulated) and easily accessing this information becomes
simple. – Cindy Sridharan
What is Observability?
10. A Culture of Observability will be a more affective than any tool
Tools will not magically “give you” observability
How much does your company value the ability to inspect and
understand your systems, workloads and behavior?
Culture of Observability
12. Monitoring Observability
Tells you whether the system works Lets you ask why It is not working
A collection of metrics and logs about a
system
The dissemination of information from the
system
Failure Centric Understand system behavior
Is “the how”/Something you do Is “the goal” / Something you have
I monitor you You make yourself observable
Monitoring vs Observability
13. Black Box White Box
Monitoring from the Outside Monitoring from the Inside
Polling, Uptime, pings, etc Metrics, Logs and Traces
Status from 3rd Party Systems you rely on Systems you own and can instrument
Still Important Critical Source of Data for Observability
Types of Monitoring
15. Logs – A record of discrete events that happened over time
Plaintext – Most common
Structure – JSON – Name/Value pair
Collecting and storing these can be expensive but valuable
Pillar 1 – Logs
16. Provides insights into what is happening in a system but you need
context.
Use or Build a Logging Standards for your systems
Write out logs that are useful and clear
Store and aggregate your logs – Many tools out there to do this
Overtime reduce what you don’t need.
Logs – Becoming More Observable
17. Log Analytics tools can help you provide context
Able to search across multiple systems in near real-time
Able to look at what happen in the past and find root cause
Create trending reports
Gain insights and learn over time how your systems behave
Single source for many types of data from multiple systems
Log Analytics
19. Metrics – a set of numbers that give information about a particular
process of activity
Numeric representation of your data in a time series format
Can be leveraged against mathematical modeling and prediction
the deliver knowledge of the behavior of your systems. – Math is
FUN!
Pillar 2 - Metrics
20. Logs can be used to give you metrics. Example: Counting the
number of error codes over a period of time to give you a metric.
Overhead of Metrics generation and stores is consistent. Logs
collection can very compared to Metrics.
Apply labels to give contexts of the data.
Metrics
21. Instrument your code to collect application metrics
System metrics are not enough
Push Developers to identify the metrics we need to monitor the
systems
Lots of great libraries and tools out there to help
Don’t be afraid of collecting too much
Visualize your data – Build Beautiful Graphs
Metrics – Become More Observable
22. Traces – a representation of a service of events that encode the
end-to-end request flow through a distributed system
Gives insights into how services interact with other services
Can see what parts of the system are performing well or poorly
Helps to identify bottlenecks
Pillar 3 - Traces
23. Identify areas where you feel tracing could be beneficial
Use sampling
Be patient
Work with developers to identify how to best instrument your
codebase to start tracing
Tracing – Becoming more Observable
25. Alert fatigue is real
Engineers become numb to noisy or false alerts
Alert on things that require action
Perform automation to remedy before alerting
Alert should tell you what is wrong and why
Better Alerting
26. Utilization – The average time that the resource was busy servicing
work – Memory Utilization
Saturation – the degree to which the resource has extra work
which it cant service, often queued – CPU Run Queue Length
Errors – The count of error events
Use the USE Method
28. Identify what your systems report
Alert when end users and customers are experiencing problems
Make this data readily available
Alert on 3 – 10 metric
Keep it simple
Create your own method
29. 1. Don’t try to boil the ocean
2. Add monitoring to developers responsibility
• Those who built know what to monitor
3. View from a Service/Application POV
4. Collect data
5. Alert on only actionable events
6. Don’t forget about the business – Track Business Metrics
Building an Observable Culture
30. Monitoring is not dead
Monitoring needs to move up the stack
Developers need to own and help instrument their code
Collect all the data
Alert smarter
Observability is not just a buzz word its a Culture
Conclusion
31. Machine Learning to help identify issues earlier and identify trends
More Tools, More Data and More Confusion
Balancing Monolithic and Micro services and Serverless
More Responsibilities with ess Resources
Leverage Automation!
Transform your Culture
Future of Monitoring and Observability
32. • Monitoring in the Time of Cloud Native – Cindy Sridharan
Monitoring and Observability – Cindy Sridharan
3 Pillars of Observability - Cengiz Han
Monitoring Isn’t Observability – Baron Schwartz
Beginners Guide to Observability – Splunk.com
Observability and Instrumentation: what they are and why they matter – Fredric Paul –
New Relic Blog
Monitoring and Observability - Ernest Mueller
Use Method
Sources and References
Hinweis der Redaktion
Observability is not just failure centric, Used for debugging and normal usage, not just when something is perceived to be broken
Share Bad Logging Experience
Enriching your data
Labels, tags! Very important
Lot ot tools – Collectd, statsd, promethius, and more
Time series database
Developers add logging, need to also add metrics to their code
Promethius aggregates before it sends it to Time Series
Look up notes
KISS method
Observability is a Culture
On prem, cloud, hybrid, Contianers, VMs, etc. Its noting getting simpler to monitor