From system performance to application metrics, we continue to further our understanding of what to monitor, why, and how to present it appropriately to the various audiences who need to act on this information. Yet there are things across our environment that we agree we can’t measure because they are unquantifiable. That doesn’t mean that there is zero signal to be analyzed and monitored.
We can look at open source software that is in wide use, yet becomes stale and unusable after years due to the atrophy of maintainers keeping it up to date with security and integrations with other software, or implementation of new features that keep it useful. How do you measure the health of your current implemented software solutions so that you know when to start planning change, or committing intentional time to a project?
In this talk, I’ll tackle these questions in addition to sharing other observations about monitoring within our environments with the goal of inspiring others to examine available signals, their impact, and the value of monitoring.
Developer Data Modeling Mistakes: From Postgres to NoSQL
Monitor the Unmeasurable Signals
1. devopsdays Portland 2016
Jennifer Davis
Twitter: @sigje
Monitor the Unmeasurable
monitored,
resilient to failure, and
increase value to our organization
heartbleed in 2014 struck across all organizations, one vector of fragility emerged. Assessing and monitoring fragility will allow us to more proactively monitor our
vulnerabilities.
2.
3. CC Image courtesy of Fruit with Swedish Pancake by Janet Hudson on Flickr
Monitoring should be viewed as stack. Maybe not a pancake stack with tasty fruit, although integrated pancake delivery with pagerduty alerts would rock. While I wait for
my event to resolve, I can eat tasty pancakes. Everything in your stack should be monitored, and made up of layers
4. CC Image courtesy of Concentrated warning by Anders Sandberg on Flickr
In all my time at Yahoo, I saw a number of signals that told me that something was wrong. When I went into different environments as a Chef consultant, I saw that it was
something that impacted all environments large and small. This made me want to start talking in a bigger forum with others. What are the signals that we aren’t
monitoring? How do we start monitoring them and proactively act on these rather than react?
5. Technology Optional
Monitoring doesn’t have to be technology driven. For example, as a manager I could track the quality of 1-1s with my reports, track who is making it to meetings
regularly, and how they are spending their time. If one person (our diamond in this case) is doing all the grunt work and doesn’t any amount of time on projects, that may
be impacting overall happiness. Too much toil work leads to unhappiness.
6. • Technology
• Organization
• Process
Monitor these 3 Types of signals.
CC Image courtesy of Train Signal at Brogdale Farm courtsey of Oast House Archive
I’m going to talk about 3 signals that are important to monitor. Easy to remember because it’s “TOP”. Technology organization and process.
7. • Dependencies,
• Consumers to producers, and
• Value generation.
Monitor Technology Signals
The missing technology monitoring aren’t the availability, error counts, latencies. These are important, but signals that we may ignore. Three examples of these signals
are dependencies, consumers to producers and value generation.
8. Monitor dependencies
Monitoring dependencies is about monitoring the versioned artifacts that my artifacts depend on. In this example I’m using the berks dependency to see what chef
cookbooks depend on the chef-client chef cookbook. There are 3 top level dependencies of cron, logrotate, and windows. Windows has an additional dependency on
chef-handler. Ideally I pin my versions so I know exactly what works and doesn’t.
9. module.exports = leftpad;
function leftpad (str, len, ch) {
str = String(str);
var i = -1;
if (!ch && ch !== 0) ch = ' ';
len = len - str.length;
while (++i < len) {
str = ch + str;
}
return str;
}
left-pad
How many people impacted by left pad? This is the entire left pad module. It’s essentially a function that implements a basic left-pad string. Many packages depended
on this simple package, including Babel, and React. In march of 2016, the author unpublished all of his work. This led to a lot of individuals who didn’t host their own
artifacts getting impacted.
10. Monitor Consumers to Producers.
When I talk about monitoring consumers to producers I’m not talking about the the software algorithm. In this example, consumers are people who use the software but
don’t contribute. Producers are people who are actively collaborating with the maintainers to produce reusable solutions, i.e. solutions that help the community and not
just themselves. Whether software is opensource or properitary, producers are the people working on the software.
11. Monitor Consumers to Producers.
An example of a danger and why we need to monitor this is looking at Heartbleed. In 2014, the OpenSSL Software Foundation published information about receiving
$2000 in donations, and one full time individual working on openssl. With these kind of investments supporting the software, it's not suprising that a vulnerability existed
in this critical software that secures hundreds of thousands of web servers. If there is important software, we need to be monitoring consumers to producers. It doesn’t
mean that we should be inventing software ourselves, because that software will have the same problem. In general, if you don’t have adequate producers to support
projects you depend on, pay some producers to do that work whether it’s donating money or other resources to the open source projects.