SlideShare ist ein Scribd-Unternehmen logo
1 von 91
Downloaden Sie, um offline zu lesen
METRICS-DRIVEN
                 ENGINEERING at
                      Kellan Elliott-McCrea, VP of Eng.
                           kellan@etsy.com @kellan




Tuesday, June 5, 12
Tuesday, June 5, 12
Tuesday, June 5, 12
What is Etsy?



Tuesday, June 5, 12
8.5+ million items
                      in the marketplace




Tuesday, June 5, 12
400,000+ active




Tuesday, June 5, 12
$300+ million in
                        sales in 2010

                      ~$41 million/month


Tuesday, June 5, 12
> $1000 / minute



Tuesday, June 5, 12
> 1 billion page
                      views / month


Tuesday, June 5, 12
business in over
                       150 countries


Tuesday, June 5, 12
deploy the site,
                      every ~20 minutes


Tuesday, June 5, 12
engineering team
                            grew
                        ~4x in 2010


Tuesday, June 5, 12
Metrics?



Tuesday, June 5, 12
Logs, Graphs,
                          Trends,
                      and Correlations


Tuesday, June 5, 12
Metrics Driven?



Tuesday, June 5, 12
Making Decisions



Tuesday, June 5, 12
How many visitors
                              are
                       using this thing?


Tuesday, June 5, 12
Can we deploy that
                       to
              100% of our visitors?


Tuesday, June 5, 12
Did we make it
                          faster?


Tuesday, June 5, 12
Did I just break
                        something?


Tuesday, June 5, 12
Q.  WHO MAKES THESE
                             GRAPHS?
           A. Well,racksOps team manages thethe
            network,
                     the
                         the servers, installed
                      monitoring tools, wears the pagers,
                              blah, blah, blah...




Tuesday, June 5, 12
but... Engineers
                            build
                      the application.


Tuesday, June 5, 12
Dev + Ops


Tuesday, June 5, 12
ACCESS


Tuesday, June 5, 12
Yes!   No.




Tuesday, June 5, 12
“Engineers are
                        too busy!”


Tuesday, June 5, 12
Here’s the BIG
                        SECRET...


Tuesday, June 5, 12
... MAKE IT EASY!



Tuesday, June 5, 12
Simple, open
                      source tools


Tuesday, June 5, 12
Cacti (network, SNMP)
                      Ganglia (machines)
                      Graphite (application)
                      Splunk (log analysis, nightly
                      reports)
                      Nagios (alerting)



Tuesday, June 5, 12
Gan
                ★cluster oriented
                ★huge community contributed
                recipes
                ★2.0 released today (including
                several Flickr and Etsy patches!)
                ★gmetad makes it easy to track
                custom metrics


Tuesday, June 5, 12
Tuesday, June 5, 12
Graphite
                ★super ïŹ‚exible collection and
                display
                ★per metrics buckets
                ★single instance
                ★super easy to write and use
                custom display functions



Tuesday, June 5, 12
Logging


Tuesday, June 5, 12
Logger::log_error("User login
                        failed. Reason: $msg for
                          $username", “login”);




Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48
                      2011] [error] [login] [14531658]
                      User login failed. Reason: wrong
                              password for ...




Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48
                      2011] [error] [login] [14531658]
                      User login failed. Reason: wrong
                              password for ...




Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48
                      2011] [error] [login] [14531658]
                      User login failed. Reason: wrong
                              password for ...




Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48
                      2011] [info] [login] [14531658]
                      User login failed. Reason: wrong
                              password for ...




Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48
                      2011] [info] [login] [14531658]
                      User login failed. Reason: wrong
                              password for ...




Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48
                      2011] [info] [login] [14531658]
                      User login failed. Reason: wrong
                              password for ...




Tuesday, June 5, 12
Counting
                      and Timing
                      http://code.ïŹ‚ickr.com/blog/
                      2008/10/27/counting-timing/




Tuesday, June 5, 12
Logster


Tuesday, June 5, 12
Logster
                      https://github.com/etsy/logster




Tuesday, June 5, 12
Forked from ganglia-logtailer :

                            - Daemon mode
                (only cron mode)
                            + Support for
                Graphite
                            + SimpliïŹed parsing
                scripts




Tuesday, June 5, 12
web0001        [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Help me, Rhonda.
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp!
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
       web0001        [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
       web0201        [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
       web0034        [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
       web1101        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
       web0201        [04:28:54   2011]   [error] [client 10.101.x.x] You've been eaten by a grue.
       web0055        [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!!!
       web0002        [04:28:54   2011]   [warning] [client 10.101.x.x] Sky is falling.
       web0089        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
       web0020        [04:28:54   2011]   [error] [client 10.101.x.x] Sky is falling.
       web1101        [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
       web0055        [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
       web0001        [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
       web0034        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
       web0087        [04:28:54   2011]   [fatal] [client 10.101.x.x] Sky is falling.
       web0002        [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
       web0201        [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
       web0077        [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
       web0355        [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
       web0052        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
       web0003        [04:28:54   2011]   [error] [client 10.101.x.x] You've been eaten by a grue.
       web0066        [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!!!
       web0001        [04:28:54   2011]   [warning] [client 10.101.x.x] Sky is falling
Tuesday, June 5, 12
Fatals   Errors   Warnings




Tuesday, June 5, 12
★runs out of cron
                ★maintains a cursor into log ïŹles
                ★supports ganglia and graphite
                ★custom parsers much easier to
                write then gmetad




Tuesday, June 5, 12
Apache access logs


Tuesday, June 5, 12
LogFormat "%h %l %u %t "%r"
                  %>s %b" common




Tuesday, June 5, 12
LogFormat "%{X-Forwarded-For}i %
             {True-Client-IP}i %l %u %t "%r" %>s %b
                "%{Referer}i" "%{User-Agent}i" %
                {etsy_shop_id}n %{etsy_uaid}n %V %
                       {etsy_ab_selections}n %
                       {etsy_request_uuid}n %
                    {etsy_api_consumer_key}n %
                    {etsy_api_method_name}n %
                  {php_memory_usage_bytes}n %
               {php_time_microsec}n %D" combined

Tuesday, June 5, 12
%{etsy_ab_selections}n




Tuesday, June 5, 12
%{etsy_uaid}n




Tuesday, June 5, 12
Graphs


Tuesday, June 5, 12
“If Engineering at Etsy has
        a religion, it’s the Church
        of Graphs. If it moves, we
          track it.” - Erik Kastner

   http://codeascraft.etsy.com/2011/02/15/measure-
   anything-measure-everything/




Tuesday, June 5, 12
Tuesday, June 5, 12
StatsD


Tuesday, June 5, 12
StatsD
                        https://github.com/
                        etsy/statsd/




Tuesday, June 5, 12
StatsD::increment("logins.success");
       StatsD::timing("gearman.time", $msec);




Tuesday, June 5, 12
90th pct

                                    average
                                    lower


       StatsD::timing("gearman.time", $msec);




Tuesday, June 5, 12
Ad hoc
                      name value timestamp




Tuesday, June 5, 12
echo "events.deploy.site 1 `date +%s`" 
              | nc graphite.etsycorp.com 2003




Tuesday, June 5, 12
Correlations



Tuesday, June 5, 12
echo "events.deploy.site 1 `date +%s`" 
              | nc graphite.etsycorp.com 2003




Tuesday, June 5, 12
Trends + Events
         target=drawAsInïŹnite(events.deploy.site)




Tuesday, June 5, 12
What Happened?


Tuesday, June 5, 12
Holt-Winters


Tuesday, June 5, 12
"Forecasting Sales by
                      Exponentially Weighted
                      Moving Averages". Peter



Tuesday, June 5, 12
"Aberrant Behavior
                      Detection in Time Series
                      for Network Monitoring".



Tuesday, June 5, 12
"Holt-Winters Forecasting
                      Applied to Poisson
                   Processes in Real-Time".



Tuesday, June 5, 12
holtWintersConïŹdence(Upper|Lower)




Tuesday, June 5, 12
holtWintersAberration




Tuesday, June 5, 12
business metrics with
             conïŹdence bands
                    ==
        alertable business metrics


Tuesday, June 5, 12
16,000 metrics in
                           GRAPHITE
                      (plus 32,000 metrics in GANGLIA)




Tuesday, June 5, 12
16,000 metrics in
                           GRAPHITE
                      (plus 32,000 metrics in GANGLIA)




Tuesday, June 5, 12
Dashboards


Tuesday, June 5, 12
Dashboards



Tuesday, June 5, 12
Dashboards



Tuesday, June 5, 12
Hard
       <a href="http://graphite.etsycorp.com/render?
       from=-1hours&width=800&height=600&title=File+or+Script+Not
       +Found&yMin=0&target=webs.errorLog.notExist&target=drawAsInïŹnite
       %28deploys.conïŹg.production%29&target=drawAsInïŹnite%28deploys.web.production
       %29&target=drawAsInïŹnite%28deploys.search.production%29&target=drawAsInïŹnite
       %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,
       %23ff0000,%23006633,%23cc6600">
       
   <img src="http://graphite.etsycorp.com/render?
       from=-1hours&width=280&height=220&title=File+or+Script+Not
       +Found&hideLegend=1&yMin=0&target=webs.errorLog.notExist&target=drawAsInïŹnite
       %28deploys.conïŹg.production%29&target=drawAsInïŹnite%28deploys.web.production
       %29&target=drawAsInïŹnite%28deploys.search.production%29&target=drawAsInïŹnite
       %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,
       %23ff0000,%23006633,%23cc6600">
       </a>




Tuesday, June 5, 12
Easy!
     $g = new Graphite($time);
     $g->setTitle('File Not Found');
     $g->addMetric('webs.errorLog.notExist', '#00cc00');
     $g->showDeploys(true);
     echo $g->getDashboardHTML(280, 220);




Tuesday, June 5, 12
48 dashboards by
                        32 engineers


Tuesday, June 5, 12
Application
                        health


Tuesday, June 5, 12
High-level
                       visibility


Tuesday, June 5, 12
Low MTTD


Tuesday, June 5, 12
ConïŹdence


Tuesday, June 5, 12
Make metrics


Tuesday, June 5, 12
Make metrics


Tuesday, June 5, 12
Make metrics


Tuesday, June 5, 12
Not that much


Tuesday, June 5, 12
codeascraft.etsy.com
                      github.com/etsy/statsd
                      github.com/etsy/logster

                      bitbucket.org/maplebed/ganglia-
                      logtailer




Tuesday, June 5, 12
Questions?




Tuesday, June 5, 12

Weitere Àhnliche Inhalte

Mehr von Kellan (8)

Future of handmade
Future of handmadeFuture of handmade
Future of handmade
 
Architecting for Change: QCONNYC 2012
Architecting for Change: QCONNYC 2012Architecting for Change: QCONNYC 2012
Architecting for Change: QCONNYC 2012
 
Engineering Change
Engineering ChangeEngineering Change
Engineering Change
 
Solving the "Brooklyn Problem"
Solving the "Brooklyn Problem" Solving the "Brooklyn Problem"
Solving the "Brooklyn Problem"
 
Social Software For Robots
Social Software For RobotsSocial Software For Robots
Social Software For Robots
 
Beyond REST? Building data services with XMPP
Beyond REST? Building data services with XMPPBeyond REST? Building data services with XMPP
Beyond REST? Building data services with XMPP
 
Advanced OAuth Wrangling
Advanced OAuth WranglingAdvanced OAuth Wrangling
Advanced OAuth Wrangling
 
Casual Privacy (Ignite Web2.0 Expo)
Casual Privacy (Ignite Web2.0 Expo)Casual Privacy (Ignite Web2.0 Expo)
Casual Privacy (Ignite Web2.0 Expo)
 

KĂŒrzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

KĂŒrzlich hochgeladen (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 

Metrics driven engineering (velocity 2011)

  • 1. METRICS-DRIVEN ENGINEERING at Kellan Elliott-McCrea, VP of Eng. kellan@etsy.com @kellan Tuesday, June 5, 12
  • 5. 8.5+ million items in the marketplace Tuesday, June 5, 12
  • 7. $300+ million in sales in 2010 ~$41 million/month Tuesday, June 5, 12
  • 8. > $1000 / minute Tuesday, June 5, 12
  • 9. > 1 billion page views / month Tuesday, June 5, 12
  • 10. business in over 150 countries Tuesday, June 5, 12
  • 11. deploy the site, every ~20 minutes Tuesday, June 5, 12
  • 12. engineering team grew ~4x in 2010 Tuesday, June 5, 12
  • 14. Logs, Graphs, Trends, and Correlations Tuesday, June 5, 12
  • 17. How many visitors are using this thing? Tuesday, June 5, 12
  • 18. Can we deploy that to 100% of our visitors? Tuesday, June 5, 12
  • 19. Did we make it faster? Tuesday, June 5, 12
  • 20. Did I just break something? Tuesday, June 5, 12
  • 21. Q. WHO MAKES THESE GRAPHS? A. Well,racksOps team manages thethe network, the the servers, installed monitoring tools, wears the pagers, blah, blah, blah... Tuesday, June 5, 12
  • 22. but... Engineers build the application. Tuesday, June 5, 12
  • 23. Dev + Ops Tuesday, June 5, 12
  • 25. Yes! No. Tuesday, June 5, 12
  • 26. “Engineers are too busy!” Tuesday, June 5, 12
  • 27. Here’s the BIG SECRET... Tuesday, June 5, 12
  • 28. ... MAKE IT EASY! Tuesday, June 5, 12
  • 29. Simple, open source tools Tuesday, June 5, 12
  • 30. Cacti (network, SNMP) Ganglia (machines) Graphite (application) Splunk (log analysis, nightly reports) Nagios (alerting) Tuesday, June 5, 12
  • 31. Gan ★cluster oriented ★huge community contributed recipes ★2.0 released today (including several Flickr and Etsy patches!) ★gmetad makes it easy to track custom metrics Tuesday, June 5, 12
  • 33. Graphite ★super ïŹ‚exible collection and display ★per metrics buckets ★single instance ★super easy to write and use custom display functions Tuesday, June 5, 12
  • 35. Logger::log_error("User login failed. Reason: $msg for $username", “login”); Tuesday, June 5, 12
  • 36. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [14531658] User login failed. Reason: wrong password for ... Tuesday, June 5, 12
  • 37. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [14531658] User login failed. Reason: wrong password for ... Tuesday, June 5, 12
  • 38. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [14531658] User login failed. Reason: wrong password for ... Tuesday, June 5, 12
  • 39. web0054 [Fri Mar 04 16:27:48 2011] [info] [login] [14531658] User login failed. Reason: wrong password for ... Tuesday, June 5, 12
  • 40. web0054 [Fri Mar 04 16:27:48 2011] [info] [login] [14531658] User login failed. Reason: wrong password for ... Tuesday, June 5, 12
  • 41. web0054 [Fri Mar 04 16:27:48 2011] [info] [login] [14531658] User login failed. Reason: wrong password for ... Tuesday, June 5, 12
  • 42. Counting and Timing http://code.ïŹ‚ickr.com/blog/ 2008/10/27/counting-timing/ Tuesday, June 5, 12
  • 44. Logster https://github.com/etsy/logster Tuesday, June 5, 12
  • 45. Forked from ganglia-logtailer : - Daemon mode (only cron mode) + Support for Graphite + SimpliïŹed parsing scripts Tuesday, June 5, 12
  • 46. web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Help me, Rhonda. web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0001 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0201 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0034 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web1101 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0201 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue. web0055 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!! web0002 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling. web0089 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling. web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling. web0002 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0201 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0077 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0355 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0052 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0003 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue. web0066 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!! web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling Tuesday, June 5, 12
  • 47. Fatals Errors Warnings Tuesday, June 5, 12
  • 48. ★runs out of cron ★maintains a cursor into log ïŹles ★supports ganglia and graphite ★custom parsers much easier to write then gmetad Tuesday, June 5, 12
  • 50. LogFormat "%h %l %u %t "%r" %>s %b" common Tuesday, June 5, 12
  • 51. LogFormat "%{X-Forwarded-For}i % {True-Client-IP}i %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" % {etsy_shop_id}n %{etsy_uaid}n %V % {etsy_ab_selections}n % {etsy_request_uuid}n % {etsy_api_consumer_key}n % {etsy_api_method_name}n % {php_memory_usage_bytes}n % {php_time_microsec}n %D" combined Tuesday, June 5, 12
  • 55. “If Engineering at Etsy has a religion, it’s the Church of Graphs. If it moves, we track it.” - Erik Kastner http://codeascraft.etsy.com/2011/02/15/measure- anything-measure-everything/ Tuesday, June 5, 12
  • 58. StatsD https://github.com/ etsy/statsd/ Tuesday, June 5, 12
  • 59. StatsD::increment("logins.success"); StatsD::timing("gearman.time", $msec); Tuesday, June 5, 12
  • 60. 90th pct average lower StatsD::timing("gearman.time", $msec); Tuesday, June 5, 12
  • 61. Ad hoc name value timestamp Tuesday, June 5, 12
  • 62. echo "events.deploy.site 1 `date +%s`" | nc graphite.etsycorp.com 2003 Tuesday, June 5, 12
  • 64. echo "events.deploy.site 1 `date +%s`" | nc graphite.etsycorp.com 2003 Tuesday, June 5, 12
  • 65. Trends + Events target=drawAsInïŹnite(events.deploy.site) Tuesday, June 5, 12
  • 68. "Forecasting Sales by Exponentially Weighted Moving Averages". Peter Tuesday, June 5, 12
  • 69. "Aberrant Behavior Detection in Time Series for Network Monitoring". Tuesday, June 5, 12
  • 70. "Holt-Winters Forecasting Applied to Poisson Processes in Real-Time". Tuesday, June 5, 12
  • 73. business metrics with conïŹdence bands == alertable business metrics Tuesday, June 5, 12
  • 74. 16,000 metrics in GRAPHITE (plus 32,000 metrics in GANGLIA) Tuesday, June 5, 12
  • 75. 16,000 metrics in GRAPHITE (plus 32,000 metrics in GANGLIA) Tuesday, June 5, 12
  • 79. Hard <a href="http://graphite.etsycorp.com/render? from=-1hours&width=800&height=600&title=File+or+Script+Not +Found&yMin=0&target=webs.errorLog.notExist&target=drawAsInïŹnite %28deploys.conïŹg.production%29&target=drawAsInïŹnite%28deploys.web.production %29&target=drawAsInïŹnite%28deploys.search.production%29&target=drawAsInïŹnite %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff, %23ff0000,%23006633,%23cc6600"> <img src="http://graphite.etsycorp.com/render? from=-1hours&width=280&height=220&title=File+or+Script+Not +Found&hideLegend=1&yMin=0&target=webs.errorLog.notExist&target=drawAsInïŹnite %28deploys.conïŹg.production%29&target=drawAsInïŹnite%28deploys.web.production %29&target=drawAsInïŹnite%28deploys.search.production%29&target=drawAsInïŹnite %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff, %23ff0000,%23006633,%23cc6600"> </a> Tuesday, June 5, 12
  • 80. Easy! $g = new Graphite($time); $g->setTitle('File Not Found'); $g->addMetric('webs.errorLog.notExist', '#00cc00'); $g->showDeploys(true); echo $g->getDashboardHTML(280, 220); Tuesday, June 5, 12
  • 81. 48 dashboards by 32 engineers Tuesday, June 5, 12
  • 82. Application health Tuesday, June 5, 12
  • 83. High-level visibility Tuesday, June 5, 12
  • 90. codeascraft.etsy.com github.com/etsy/statsd github.com/etsy/logster bitbucket.org/maplebed/ganglia- logtailer Tuesday, June 5, 12