Monitoring and observability can both be a critical part of a healthy IT environment. While they both rely on similar data and metrics, they are not the same thing. Observability moves beyond monitoring and alerting to help detect and solve the root cause of the issue.
Observability measures the internal states of a system by examining its outputs. Successful observability strategies make it easier for teams to identify IT issues before they cause disruption. In the past, IT teams have had to be reactive when it comes to managing these challenges and it can still take days or weeks to discover the root cause of problems. Observability enables IT teams to identify anomalies that present a potential IT issue and address those anomalies before they become a major problem.
During this on-demand webinar, you will learn:
• The difference between monitoring and observability
• How observability can identify the root cause of an IT issue
• The top justifications for implementing observability in your IT environment.
5 Reasons Observability of your Mainframe and IBM i is Critical for IT
1. 5 Reasons Observability
of your Mainframe and
IBM i is Critical for IT
Bill Hammond | Director, Product Marketing
Ian Hartley | Senior Director, Product Management
2. Today’s agenda
• The difference between
monitoring and observability
• How observability can identify
the root cause of an IT issue
• Observability for your mainframe
and IBM i systems
• Q & A
2
4. 4
Observability allows teams to:
• Monitor modern systems
more effectively
• Find and connect effects in a
complex chain and trace
them back to their cause
• Enable visibility for system
administrators, IT operations
analysts and developers into
the entire architecture
Observability is the ability
to measure the internal
states of a system by
examining its outputs.
Splunk Dara Insider article – What is Observability?
5. 5
•Observability is a property
of that system, like
functionality or testability.
Monitoring is an action you
perform to increase the
observability of your system.
Splunk Dara Insider article – What is Observability?
6. Observability benefits
* Splunk – State of Observability 2022
6
• Observability leaders are 2.1 times as likely to say that they can
detect problems in internally developed applications in minutes.
• Leaders report a 69% better mean time to resolution for unplanned
downtime or performance degradation.
• Leaders’ average annual cost of downtime associated with
business-critical internally developed applications is $2.5 million,
versus $23.8 million for beginners.
7. Where is observability headed?
* Splunk eBook - IT+Observability Predictions 2023
7
More of our customers are building apps using microservices and
containerized architectures, and the way you monitor performance
when a container may exist for mere milliseconds is so different from
earlier, less dynamic architectures.”
Observability, will continue to support hybrid, multicloud infrastructures
as they sprawl toward the edge and incorporate machine learning,
microservices, containers and all the rest, and as the very practice of
IT operations continues to evolve.
Observability tools will be easier to use, despite that complexity
9. * ctrlstack Blog - Root Cause Analysis: How can an idea that’s wrong be so useful?
9
“Catastrophic failure
occurs when small,
apparently innocuous
failures join to create
opportunity for a
systemic accident….
Because overt failure
requires multiple faults,
there is no isolated
‘cause’ of an accident.”
The first rule of root
cause analysis is that
there is no root cause
10. * Gartner - Innovation Insight for Observability – March 9, 2022
Finding the cause
without having all
the data…
10
• The volume of data aggregated by tools can be
immense, making it difficult for I&O leaders to
understand the data
• Missing Data is one of the key challenges to
effective troubleshooting
• Distributed system architectures increase the
need for observability because such architectures
can fail due to interaction between multiple
systems*
12. 90% Over 75%
Companies (with mainframe)
indicate this is a platform for
growth and long-term applications
Organizations (with IBM i) indicate
over half of their core business
applications run on
IBM i
4 out of 5 Here to stay…
Executives say their organizations
need to rapidly transform, including
modernizing mainframe-based
apps
• Good value
• Reliability
• Horsepower
12
Kyndryl,”Perspectives on the Modern Mainframe” 2021 HelpSystems “IBM i Marketplace survey results”, 2021
Wikibon “10-Year Worldwide Enterprise IT Spending 2008-2017”
Mainframe
and IBM i
today
• Remain mission-critical
• Central to many
systems & services
• Typically, isolated
environments
13. Correct perspective
13
• Mission-critical systems, services and applications
• Busy environments…hundreds of thousands of
transactions per minute
• 24x7x365 availability
• Serving millions of customers
• IBM platform…just part of the puzzle
14. 14
• Leverage appropriate information and sources
• Observability
• Logs
• Metrics
• Traces
• Controlling
the fire hose
Relevant information
15. 15
Add mainframe and IBM i
• Wider visibility
• Improved agility
Capture Key Performance Indicators (KPIs)
• Enough to answer data questions
• Status
• Health
• Trends
Individual transactions
may not be sensible
Improved MTTI and MTTR…less issues and outages
Insights and agility
16. 16
“IT Resilience” is top of mind
• Operations
• Security
Network and Security Operations Centers
• Pervasive visibility
• Maximum availability
• Agile, proactive responses
Today, you need it all covered
Cover all bases
17. “Important business services” initiative
Down is not an option…resilience is a must
• IT systems with far-reaching impact
• Initially card payment services
• Understand all moving parts
• Gain visibility of KPIs
• Summarize in single high-level
dashboard
• Leveraging Splunk IT Service
Intelligence “Glass table” view
• Health, status, performance, trends
• Single view
• Ironstream for Splunk delivering
mainframe information
• Finger on pulse of key IT services
• Significant “observability”
improvements
• More informed team with essential
data points
• Better high-level view
• Confidence the business is
working!
• Managing hundreds of applications
• Critical business services
• Serving millions of customers,
worldwide
Leading global bank
Observability
in action
Monitoring and observability can both be a critical part of a healthy IT environment. While they both rely on similar data and metrics, they are not the same thing. Observability moves beyond monitoring and alerting to help detect and solve the root cause of the issue.
Observability measures the internal states of a system by examining its outputs, and successful observability strategies make it easier for teams to identify IT issues before they cause disruption. In the past, IT teams have had to be reactive when it comes to managing these challenges and it can still take days or weeks to discover the root cause of problems. Observability enables IT teams to identify anomalies that present a potential IT issue and address those anomalies before they become a major problem.
During this webinar, you will learn:
The difference between monitoring and observability
How observability can identify the root cause of an IT issue
The top justifications for implementing observability for your mainframe and IBM i systems
Register now to learn how IT organizations can benefit from an observability-based solutions and strategies.
Speakers: Ian Hartley
Specifically, monitoring is the act of observing a system’s performance over time. Monitoring tools collect and analyze system data and translate it into actionable insights. Fundamentally, monitoring technologies, such as application performance monitoring (APM), can tell you if a system is up or down or if there is a problem with application performance. Monitoring data aggregation and correlation can also help you to make larger inferences about the system. Load time, for example, can tell developers something about the user experience of a website or an app.
Observability, on the other hand, is a measure of how well the system’s internal states can be inferred from knowledge of its external outputs. It uses the data and insights that monitoring produces to provide a holistic understanding of your system, including its health and performance. The observability of your system, then, depends partly on how well your monitoring metrics can interpret your system's performance indicators.
Thank you, Bill,…
In this section I am going to cover 5 reasons why including your mainframe and IBMi environments in your observability strategy is critical.
Now…there are certainly more than 5 reasons why you should do this…but these 5 give you a broad view over why this is essential.
Remember…you are using a mainframe or IBMi for a reason.
This platform is part of your IT infrastructure and may be vital to many systems and services in use by your organization.
So, it is important.
With that in mind…let’s start with reason number 1…the criticality of these environments in today’s enterprise IT.
For many…mainframe and IBMi are really critical…with many processes relying on their availability and performance.
But…they DO often remain isolated and off to one side…away from what many view as “core IT”
That said…those how are invested in these platforms do not necessarily see them going away.
90%...of companies using mainframe say they are still investing in the platform
Over 75% with IBMi say more than half their core business rely on it
4 out of 5…80% of executives are looking to modernize their mainframe…but they are not saying it going away.
Fundamentally…these platforms are here to stay…because they offer a great performance and reliability…as well as being embedded in so many systems and services.
And this is where reason number 2 comes in.
Because you have one of these IBM platforms playing a significant role in your IT…you need to get the correct perspective
As previously said…these play a part in systems and services that are mission-critical.
And they are busy environments…often processing hundreds of thousands of transactions per minute…with expected…or even required…availability for every hour of every day
In demand by possibly millions of customers…that depend…and certainly expect to do what they need…whenever and wherever they are.
But…the IBM piece of a typical stack is often just element in a broader…more complex system.
Today…a given application will span multiple technologies…multiple platforms…multiple environments.
So, you need broad visibility and high-level perspective across all areas…not just web servers or databases. You need it all.
But…in that IT jungle…you must be able to capture what is meaningful.
Gather what matters…and quickly zero-in on what is important.
And so, we have reason number 3 – relevant information.
You have to be able to tap into what is relevant to what is traditionally required for “observability”:
Logs
Metricsand
Traces
Capture these are you’re on the way to observability for your mainframe or IBMi.
But…you also have to remember these platforms are probably the best metered boxes on the planet…logs, metrics, and traces are part of their DNA.
However, getting relevant information is another matter. It is very east to be overwhelmed AND you have to make sense of what you capture.
Because…the name of the game is…ultimately…to make things better.
Improved IT resilience…improved customer experience…improved effectiveness of your IT team.
Reason number 4…is to drive better insights and agility.
Your team need the best possible defense against IT issues. They need to see across all of the IT stack…including mainframe and IBMi…as well as being able to respond as fast as possible when something IS identified…driving down those all-important mean time to identification and resolution times.
To do this…you will need to focus on Key Performance Indicators…capturing enough data to answer key questions around:
Status
Health and…
Trends
Getting good observability across these puts you in a great position to be able to focus on what is important and needs attention.
However,…one thing we have heard from customers…especially those that operate with high processing volumes…is that observing individual transactions is not necessarily the best approach.
It’s like inspecting every blade in a haystack…it requires a lot of effort and and comes with overhead…so you really just want to know whether the haystack is OK or starting to smoke.
Again…appropriate insight leads to improved agility. This also goes back to reason number 3…getting relevant information.
And…reason 5
You need to cover all the bases
What I mean by this is…there is no point ensuring all the windows are closed while leaving the front door unlocked. It is incomplete. Not robust.
In the context of your IT…you must have all of it under surveillance…both from an operational and a security point of view.
With “Resilience” top of mind for every IT team…it is no surprise that “observability” is getting a lot of attention.
In essence, your Network, Service, and Security Operations Centers…must have wide visibility across all aspects of IT.
Today…things no longer work in isolation. Everything is inter-related and inter-connected. There are many dependencies…and consequences…directly linked to the performance, health and status of your IT.
Alongside this…there are increasing demands and pressures from your business and your customers…
You need to leverage “observability” in the right way…every day.
With this in mind…let’s look at a real-world example.
We’re working with a leading global bank…helping with their drive towards better observability and improved IT resilience.
They manage hundreds of applications, and many rely on their mainframe infrastructure…so they need visibility from mobile to mainframe…to ensure services used by millions of customers are running well. Because…if they are not…there are many consequences…from loss of customers to regulatory fines and penalties.
The bank is initially focused on card payments…capturing Key Performance Indicators from across the spectrum of IT…and bringing this into Splunk.
Here…Splunk’s IT Service Intelligence product is being used to show a “glass table”…or single-page view of key business services…where it is easy to see the Red Amber Green status as well as understand performance trends and other metrics.
Mainframe is critical to many functions within the bank…and our Ironstream product is supplying real-time insights to their Splunk environment.
This combination of tools allows the bank to easily determine system and service health and whether they need to respond to a situation that may cause disruption.
Fundamentally, they have a finger on the pulse of their complete IT stack for key processes. This improved observability gives the team essential information…allowing them to have better high-level view…and confidence that the business is working.
If something does step out of line…fails…or even starts to trend in the wrong direction…they can react quickly and efficiently to minimize any impact.
This bank is implementing and succeeding with observability…putting into practice the 5 reasons of why it is…also…critical for your mainframe and IBMi platforms.