I've gone M.A.D - monitoring aided development: the backstory of a
sysadmin going all the way to development and back. Monitoring and
metrics are the bread and butter of any admin and a must have for any
production infrastructure. I will present how certain development
techniques are already mirrored in operations without neither groups
being aware of it and how developers can take advantage of monitoring
systems to feedback into the development process. Furthermore I will
highlight how a common language developed around testing can help the
two groups communicating and increase the level of trust, which is
fundamental for any functional organization and the true nature of
devops.
3. Monitoring and metrics: a
lifetime love
Love for
monitoring
career
FOSDEM11 Spike Morelli / fsm@spikelab.org
4. One day in data
warehousing...
“There is no such thing as
too much data, only data
you don't know how to make
sense of”
FOSDEM11 Spike Morelli / fsm@spikelab.org
5. do mind
information overflow
FOSDEM11 Spike Morelli / fsm@spikelab.org
6. You, yes you, do you test your code?
FOSDEM11 Spike Morelli / fsm@spikelab.org
12. Complexity is your enemy
➠ Callgraph and NESTING
➠ Number and size of functions
➠ Code CLOSURe
➠ Complexity of the build system
FOSDEM11 Spike Morelli / fsm@spikelab.org
18. HOW > IF
“It is never too early to
start monitoring your
application's behaviour”
FOSDEM11 Spike Morelli / fsm@spikelab.org
19. Write code that is
monitoring friendly
FOSDEM11 Spike Morelli / fsm@spikelab.org
20. OPS is changing
➠ Configuration management
➠ Infrastructure-as-c0de
➠ Behaviour driven development
➠ Cucumber
➠ Robotframework
➠ Continuous integration for OPS
FOSDEM11 Spike Morelli / fsm@spikelab.org
21. ...and you can help
➠ If you are an op
➠ Realize and accept that you code
➠ There are lots of good reusable
development patterns
➠ Advertise your achievements
➠ Engage your developers
FOSDEM11 Spike Morelli / fsm@spikelab.org
22. ...and you can help
➠ If you are a dev
➠ Treat ops as developer, don't exclude
them
➠ Share the knowledge
➠ Code applications that are easier to
monitor
➠ Learn from your ops how to make
your app production ready
FOSDEM11 Spike Morelli / fsm@spikelab.org
23. The most important metric
TRUST
FOSDEM11 Spike Morelli / fsm@spikelab.org
24. Don't let uncertainty drive
you insane,
go M.A.D.
FOSDEM11 Spike Morelli / fsm@spikelab.org
Wanna share my experience What I learned going dev and what I brought back SHOW OF HANDS, how many admins, how many devs? How many that do both? Do we have QA?
* Start by saying strongly believe into: everybody codes * As a sysadmin start lots oneliners * Not enough bash scripts * More complex, perl python ruby scripts * Still doesn't feel like coding * You're looked upon as a non dev * But was “hello world” coding? Why? * Yes maybe not a great dev, but still * This is bad, creates wrong culture * You can test bash scripts * Also another important stepping stone: infrastructure as code. * Puppet manifests * what about QA? Test scripts and automation, isn't that code? * So you see, we all code
* Loved monitoring since day one * Simple idea of knowing nagios was checking stuff for me made me sleep better * Then discovered metrics, world of wonders. * A complete change of perspective, not only you know if something is broken, but you know how it's doing * How am I supposed to answer any question on the state of my systems? * think of it as if we you were a doctor. * western doctors Vs chinese doctors * Anecdotal chinese doctor , you didn't pay when you were sick . No longer true unfortunately
* I was on loan to dw for a few months * that's where my passion for data and viz was born * met a fellow data monger that was amazing, he could dig out of piles of data the most amazing things and he was usually right. * the all process was like magic, like looking at the matrix encrypted and see'ing people * the guy could look at a pile of data and see patterns, questions and answers – next * before sounding completely nuts about metrics
* can be mislead * miss things in the noise * requires more effort to make sense of things * there are no free lunches. But the benefit can be huge
* didnt use test, too hard, no understand, felt no dev * worked in the right place, was exposed to tests, it INSPIRED ME * still no tests cuz team was SE, me complacent * When startup making a product, I've got no excuse, especially being alone with no QA. * Bugs will make you lose customers * started with unittesting, then tdd * realized was doing TDD with nagios, check deployed before srv and then pass service was live * Testing upfront was awesome, confidence went up * Use TDD to do security * New metrics! Add tests, success rate, monitor my progress
* says something at a glance * was sloppy testing * until I wanted to release * and I diverged again afterwards * so I started to do tdd I didn't diverge anymore * would have I been able to notice without this graph? * the graph empowered me to spot a behaviour * what's the benefit? You know, you can prove the effectiveness, – next * but then you wonder, how much code am I testing?
* Adding tests alone is not meaningful * You wonder about them * In Nagios you did the same, you test a port, but what does that cover? * More and more people do end-to-end checks
* again, you can notice something at a glance * adding tests does not mean adding coverage * also see for 0.2 I hit 100% but I've only added a few tests? * let's see next slide
* you gotta start somewhere * that somewhere is usually wrong :) * nonetheless, I found 'em useful for something == Next slide * remember from the previous slide that I hit 100% without adding many tests?
* once again metrics tell us something we'd otherwise missed * tell us why we achieved 100% == Next * but loc is a poor metric after all * more or less lines of code aren't that important * what matters is complexity!
* complexity is your enemy * KISS * calculating complexity is a huge thing, you could make a talk on it * no secret formula, I just got the idea from spamming systems to use scoring * so what are interesting metrics about complexity? * ask audience, can anybody think of another? http://michaelfeathers.typepad.com/michael_feathers_blog/2011/01/measuring-the-closure-of-code.html
* once again metrics tell us a story * stories are far more powerful than numbers
* style matters * good code is more readable * easier to understand and refactor * less prone to contain bugs * it makes you feel better
* discussed lots of metrics * wanted to make a point they should help you, not hinder you * people say “create a metric and people will game it” * lots of books on psychology make a scary point about rewards * by any mean toss them if they hurt you * remind yourself about the peril * with great powers, come great responsibility. So it does with metrics.
* quite simple really * moved it into a CI * buildbot would work too – Next slide * all excited by these metrics and developing that I forgot where I came from
* that's where I come from * how many people here have actually been on-call before? * Hard to appreciate if you've never been on-call in an ops team. * Had been a sysadmin for a while, but not oncall and it was a shock * After I left every time I was in a public places and someone had the same ringtone I twitched * SO WHAT CAN YOU DO?
* how long do your unit tests take? * has this changed? * how much cpu or mem during the last global pass? * do you build your sw? How long does that take? Has that changed? * Do you run integration tests? That's like live, monitor those systems! * cheap way to profile your app * Very useful and possible to spot mem leak or cpu increase
If it's never too early to start monitoring your app, never too early yo make your app easy to monitor We're making good progress, think memcached mgmt interface, mysql session variables MAKE COLLECTION EASY SOME EXAMPLES: HTTP REQUEST TIMING SIZE OF REQUEST DB QUERY TIME NUMBER OF CONNECTED CLIENTS TOP REQUESTS
* CM been a long time resident (cfengine) * but now many more people do it (still lots to do tho) * infra-as-code is a result of CM. Name says it all * Consulting for a company now that uses jenkins and we're building one in ops to do CI for scripts and puppet manifests * as we move toward this direction need more access to devs to be successful
* use a vcs * branch * test
* anybody wants to guess? * trust in your code ** that's why you test, test your code works, trust you can change it * trust in the people ** stealing from patric debois ** trust tax * at the end of the day you don't want to test for fun * test for profit, to make a good job * you can't make a good job without trusting yourself and the people that work with you * especially true across departments (Dev/ops) * I BELIEVE THAT TESTING CAN BE THE PLATFORM TO BUILD TRUST UPON AND FOSTER THE CULTURAL CHANGE THAT DEVOPS IS ALL ABOUT