Monitoring at a SAAS Startup: Tradeoffs and Tools

Monitoring at a SaaS
Startup
Tradeoffs and Tools
Bridget Kromhout

8thbridge.com
small social commerce startup
acquired in the last week by Fluid, Inc.
small devteam
I am the ops team

twisty maze of little shell scripts
bespoke artisanal
monitoring
difficult to modify;
doesn’t scale
http://www.pcgameshardware.de/screenshots/1280x1024/2007/07/CA01.jpg

New Relic
pros:
nice graphs
application-level view
good error analysis
cons:
slow to update
many false-positive alerts
high prices (better now)

Motivating
Change
http://99designs.com/illustrations/contests/illustration-pagerduty-161025/entries

https://laur.ie/blog/2014/02/why-ill-be-letting-nagios-live-on-a-bit-longer-thank-you-very-much/
“Horrendous interface”
“Well, it’s more “old” than anything
else. At least everything is in the
same place as you left it because it’s
been the same since 1912.”

“Sensu has so many
moving parts that I
wouldn’t be able to
sleep at night unless
I set up a Nagios
instance to make
sure they were all
running.”
-- @murphy_slaw (via @lozzd)

HBase: monitor all the ports?!?
hbck: the HBase consistency checker
nagios -> bash script -> parsing output of hbck
http://www.ymc.ch/en/how-to-monitor-hbase-health-by-nagios

adding alert after alert after...

http://modiinhub.com/wp-content/uploads/2014/02/logo-mongodb-tagline.png

MMS (MongoDB Monitoring Service)

“cyber” monday:
1988 called; wants its word back.
the rewards of hubris
MMS showed the issue
but we weren't alerting on it
didn't understand the global write lock

If it moves, we track it.
Sometimes we’ll draw a graph
of something that isn’t moving
yet, just in case it decides to
make a run for it. -- @indec
http://codeascraft.com/2011/02/15/measure-anything-measure-everything/

Graphite & StatsD
➔ Graphite
◆ Store and visualize time-series data
◆ http://graphite.readthedocs.org/
➔ StatsD
◆ Measure everything! (Timers, counters, events, …)
◆ https://github.com/etsy/statsd/

Where we were
➔ Graphite 0.9.9 (wanted 0.9.12)
◆ over 2 years old
◆ missing new features (Consolidate by!)
➔ StatsD was newish, but…
◆ hand-rolled
◆ running in a screen session
◆ on a special snowflake box

Community cookbooks?
➔ Graphite ones good, but…
◆ focus on Apache (we use nginx)
◆ we haven’t moved to Chef 11 (gasp!)
➔ StatsD
◆ https://github.com/librato/statsd-cookbook
◆ launches daemons via upstart
◆ generates config file based on attributes

Graphite cookbook (Part 1)
➔ Install in a virtualenv (django, uwsgi, nginx)
➔ Dependencies recommended
◆ https://github.com/graphite-project/graphite-
web/blob/master/requirements.txt
➔ libcairo2-dev package on Ubuntu 12.04 LTS
➔ install graphite’s 3 parts via pip

Graphite cookbook (Part 2)
➔ graphite-web
◆ Django app, renders graphs
➔ whisper
◆ fixed-size database for storing time-series data
◆ like RRD
➔ carbon
◆ carbon-cache.py - stores data
◆ carbon-aggregator.py - buffers, then stores
◆ carbon-relay.py - for sharding/replication

when in doubt: tcpdump is your friend
http://blog.johngoulah.com/2012/10/looking-under-the-covers-of-statsd/

carbon-aggravator (between 0.9.10 & 0.9.12)
# If set true, metric received will be forwarded to
# DESTINATIONS in addition to
# the output of the aggregation rules. If set false
# the carbon-aggregator will
# only ever send the output of aggregation.
FORWARD_ALL = True

Carbonate
whisper-fill.py
backfill datapoints between whisper files

2am: sudden drop-off
8am: look at graphs: ?!?!
10am: and we’re back.

❏ finds real problems
❏ actionable alerting
❏ usable by all
❏ …?
the ideal
monitoring
solution...
http://www.quickmeme.com/img/f5/f512ff9bee084263df5571d3c81388019dcb063173e1dbcfa2babac9274576b6.jpg

What we’re actually using now
StatsD
Application-level error
analysis
Alarms for autoscaling
Timers &
counters
Log & host-level
Hadoop & HBase
visualization
MongoDB
Graphs
Time-series
data graphing
client-side
plugins
External uptime checks
oncall rotation/alerting
Threshold-based alarms
Dashboard

Discuss!
Twitter: @bridgetkromhout
Email: bridget@kromhout.org

Monitoring at a SAAS Startup: Tradeoffs and Tools

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Monitoring at a SAAS Startup: Tradeoffs and Tools

Similar to Monitoring at a SAAS Startup: Tradeoffs and Tools (20)

More from bridgetkromhout

More from bridgetkromhout (20)

Recently uploaded

Recently uploaded (20)

Monitoring at a SAAS Startup: Tradeoffs and Tools