Clearspring has several hundred servers across two data centers that receive 3.5 billion events per day from Java, Scala, and PHP applications. To monitor this large infrastructure, they use Nagios to check server health and Ganglia to collect metrics from applications and servers, which are then sent to Graphite for long term storage and visualization. Codahale Metrics libraries are used to instrument the applications to collect different metric types like gauges, counters, histograms and timers to understand performance and detect issues. Graphs of these metrics in Graphite have helped the small operations team at Clearspring to determine what to set alerts on to quickly address problems.