Our monitoring team works in a cycle of 4 phases: Definition, Collection, Visualization and Action. We've found it effective to be clear about what phase we are in to help communicate our needs as well as our progress. This talk was presented as a lightning talk at Monitorama 2015 by Melanie Cey
2. Responsibilities @ Yardi
Implementation and administration of monitoring,
alerting, and log aggregation/analysis tools.
o 15,000+ Devices
o 9 Datacenters
o 5000+ Customer Installations
o We monitor windows envs with linux envs
11. 1. Definition
I can hit this one page so it’s up right?
No thanks, let’s redefine status
12. 1. Definition
o What questions are you trying to answer?
o What information do you need when a failure
occurs?
o What are the most common failures?
o Who is the audience for the information?
13. 2. Checks & Collections
o Environment & Code
o Data points
o Detailed logs
o Current state
20. Is “X” monitored?
When “X” goes into some degraded state
o The right people know.
o They have enough information to find the
problem, recover, and later to do RCA.
o If they don’t they will revisit definition.