This document provides an overview of application performance monitoring and instrumentation. It defines key terms like metrics, sampling, frequency, and instrumentation techniques. It discusses different instrumentation approaches including passive vs active monitoring and external vs internal instrumentation. External approaches include browser real user monitoring, endpoint monitoring, and network monitoring. Internal approaches cover software, application container, application, hardware/platform, network, and log metrics. The document concludes with a live demo of an application monitoring tool.
How AI, OpenAI, and ChatGPT impact business and software.
In-depth look at application performance monitoring
1. An In Depth Look at
Application Performance
Monitoring
June 24th , 2015
2. About Ernest Mueller
2
• Product Manager at Idera
(@ernestmueller)
• 20 years experience in IT from
Startups to large Enterprise
• Runs DevOpsDays Austin,
CloudAustin user group
(theagileadmin.com)
4. What is
instrumentation?
4
Instrumentation is the use of measuring
instruments to monitor and control a
process. It is the art and science of
measurement and control of process
variables within a production,
laboratory, or manufacturing area.
- Wikipedia
5. Some Monitoring Theory
5
• Metric
A specific set of measurements using a specific type of instrumentation.
• Sample
One reading of a metric.
• Frequency
Your sampling rate – how often you sample the metric.
• Instrumentation Technique
How you are sampling your metric – at what point, using what method, in what
depth, with what frequency.
• Instrumentation Point
What exact point in the system is being instrumented
• Instrumentation Method
Exactly how the metric is being sampled
6. Some Monitoring Theory
6
Instrumentation Method Approaches
• Passive
Reading metrics off a running system without generating load (mostly)
• Active
Specifically requests a metric, or provokes activity from the system to
create metrics.
• Application Performance Management (APM)
The monitoring and management of performance and availability
of software applications. APM strives to detect and diagnose application
performance problems to maintain an expected level of service. APM is
“The translation of IT metrics into business meaning ([i.e.] value)."
7. External Instrumentation Approaches
7
The most common external instrumentation
approaches include:
1. Browser RUM (Real User Monitoring)
2. Global endpoint monitoring
3. Network RUM
4. Local endpoint monitoring
5. Network APM (Application Performance Monitoring)
6. Database APM
7. Network monitoring
Brent starts off with control of the deck.
Liz - Thank attendees for being here. Let everyone know they will be muted, but if they have any questions, to ask in the chat feature. We will have time for Q&A at the end. Then, immediately hand off to presenter Brent Ozar
Browser RUM (real user monitoring) using JavaScript instrumentation embedded in Web pages that sample information on the user experience. RUM can capture actual user experience across a wide part of the service, but its scope is limited to what your users are currently doing -- it also generates a lot of data. Web analytics are closely related to browser RUM. Pure API traffic isn't captured, and mobile apps typically require a separate implementation.
External synthetic transactions generate synthetic Web transactions applied to the system from various external geographic locations. These have the benefit of repeatedly testing the service in the same way from various points and provide great performance-over-time information that can be used effectively for measuring SLA attainment. However, they can't exercise all parts of the service and generate load on the service in the process.
Network RUM is based on network capture of user traffic on the server side. It's limited to when you have physical access to the network, and it doesn't see activity from browser or CDN caches. However, it can see more protocols than Browser RUM and doesn't suffer from browser compatibility issues.
Internal synthetic transactions apply synthetic Web transactions to the system from inside the service network. They're very actionable for alerting (if it fails, the service is most likely down), but do not cover the full chain required to deliver the service to an end-user. A variation of this applies a probe from onboard an individual system to services running on that system itself.
Network APM (application performance monitoring) analyzes the behavior of system components by watching the interchange on the network. It covers many protocols and provides insight into network-based performance issues, but it's blind to the complexity that lies behind that IP address.
Database APM offers deep-dive analysis of database activity and performance statistics. It provides plenty of information, including database errors and performance issues, but does not expose issues across the other 90 percent of the stack. Also, support for the explosion in diversity of NoSQL/NewSQL data stores is a challenge.
Network monitoring of network devices and flows is important to identify and diagnose network problems, but it usually does not take into account the higher-level operation of applications, services, and users that is more meaningful to the business.
Software platform metrics: Simple process uptime monitoring is the most basic method, but most Web servers, app servers, and other third-party components also surface metrics about their operation via a status page or other means. These provide another data point on uptime and performance, which helps isolate issues, but you're limited to the specific metrics the software provider decided to provide.
App container metrics: Typically, these are Java JVM metrics via JMX or code instrumentation (or similar metrics on other platforms). These deliver excellent depth to find application issues at runtime, but there are thousands of fine-grained metrics that require some sophistication to understand.
Application metrics: These surface from inside the application itself using a metrics library. They're very valuable because they are custom to the exact data you want to surface, whether it's dollars sold in your online store or number of customers served -- but your developers need to write code explicitly to surface them.
Hardware platform metrics: Here we're talking about OS-level metrics (the ever popular CPU/memory/disk), the underlying abstraction layer, if any (for example, Amazon AWS metrics pulled using CloudWatch, virtualization layer metrics, or LXC container metrics for Docker users), and hardware metrics. They're necessary to identify resource shortfalls and provide insight into many common issues, but may or may not be representative of the service experience in the real world.
Network metrics: These metrics are gathered by sniffing the interface on each server. Otherwise, they're similar to the network APM technique discussed above.
Log aggregation: All of the above parts of the system usually dump records of events and metrics into log files, which are an alternate path to gather much of the same information. Log information is often richer than pure metrics, but it's also large in volume and often slower to collect and process for rapid information.