3. The objective of metrics is to
make pretty graphs...
in order to understand the performance
and capacity
of your systems and how they vary over time.
4. The objective of monitoring is to…
make the Operations-guy-on-call’s life hell.
5. The objective of monitoring is to
check that the system is working as expected
and take action if some component isn't.
6. “Those who cannot remember the past are
condemned to repeat it” - George Santayana
So here’s a case study…
7. A long time ago in a data centre far,
far away….
8.
9. Complete system includes humans to run it!
Human Factors Engineering.
http://en.wikipedia.org/wiki/Human_factors
2 x Linux Engineers
1 x Network Engineer
1 x Do Anything Guy
1 x Developer
13. Large Development team
External Consultants
ITIL Process people
5 x Linux Engineers
1 x Network Engineer
2 x Database Administrators
and
Part of an Infrastructure team that included
Virtualization specialists
Storage specialists
Hardware specialists
20. What’s different about the cloud?
• Servers come and go
• Sometimes automatically with auto-scaling
• Topologies and Architectures change rapidly
• Driven from Configuration Management
Systems
21. The problems with Nagios
• Clunky UI.
• Monolithic design.
• Hard to scale.
• Hard to add nodes
dynamically.