SlideShare ist ein Scribd-Unternehmen logo
1 von 131
Operational
EfïŹciency
Hacks
                          John Allspaw
          Operations Engineering, Flickr
who am I?
Manage the Flickr Operations group
Wrote a geeky book:
“EfïŹciencies”
“EfïŹciencies”
   Doing more with the robots you’ve got
“EfïŹciencies”
   Doing more with the robots you’ve got
   Doing more with the humans you’ve got
Some optimization
    “rules”
Some optimization
         “rules”

-   Don’t rely on being able to tweak anything.
Some optimization
         “rules”

-   Don’t rely on being able to tweak anything.
-   Don’t waste too much time tuning when
    you have no evidence it’ll matter.
Optimization “rules”
performance tuning gains




                                 time spent tuning
Optimization “rules”




“We should forget about small efïŹciencies,
say about 97% of the time: premature
optimization is the root of all evil.”
                         Knuth, (or Hoare)
however...
Optimization “rules”
Optimization “rules”

That doesn’t give us an excuse to be lazy and
inefïŹcient.
Optimization “rules”

That doesn’t give us an excuse to be lazy and
inefïŹcient.
Optimization “rules”

That doesn’t give us an excuse to be lazy and
inefïŹcient.


We lean on the experience of people in the
community for evidence that tuning(s) might
be a worthwhile thing to do.
Optimization “rules”

“Yet we should not pass up our
opportunities in that critical 3 percent.”
                       Knuth, (or Hoare)
So...
          stop somewhere
               in here




                           OMG
obvious
                           I'm wasting
tuning
                           !@#$ time
wins
                           for no reason
Our Context
Our Context

-   24 TB of MySQL data
Our Context

-   24 TB of MySQL data
-   32k/sec of MySQL writes
Our Context

-   24 TB of MySQL data
-   32k/sec of MySQL writes
-   120k/sec of MySQL reads
Our Context

-   24 TB of MySQL data
-   32k/sec of MySQL writes
-   120k/sec of MySQL reads
-   6 PB of photos
Our Context

-   24 TB of MySQL data
-   32k/sec of MySQL writes
-   120k/sec of MySQL reads
-   6 PB of photos
-   10TB storage eaten per day
Our Context

-   24 TB of MySQL data
-   32k/sec of MySQL writes
-   120k/sec of MySQL reads
-   6 PB of photos
-   10TB storage eaten per day
-   15,362 service monitors (alerts)
Infrastructure Hacks
Infrastructure Hacks

-   Examples of what changing software can do
          (plain old-fashioned performance tuning)
Infrastructure Hacks

-   Examples of what changing software can do
          (plain old-fashioned performance tuning)
-   Examples of what changing hardware can do
                              (yay for Mr. Moore!)
Leaning on compilers
(synthetic PHP benchmarks, not real-world)




(http://sebastian-bergmann.de/archives/634-PHP-GCC-
                  ICC-Benchmark.html)
PHP (real-world)




php 4.4.8 to php 5.2.8 migration
Can now handle more with less




same
taste,
 less
ïŹlling
Image Processing
Image Processing


-   2004, Flickr was using ImageMagick for
    image processing (version 6.1.9)
Image Processing


-   2004, Flickr was using ImageMagick for
    image processing (version 6.1.9)
-   Changed to GraphicsMagick, about 15%
    faster at the time (version 1.1.5)
Image Processing


-   2004, Flickr was using ImageMagick for
    image processing (version 6.1.9)
-   Changed to GraphicsMagick, about 15%
    faster at the time (version 1.1.5)
-   Only need a subset of ImageMagick features
    anyway for our purposes
Image Processing

-   OpenMP support
    (http://en.wikipedia.org/wiki/Openmp)
    - Allows parallelization of processing jobs,
       using multiple cores working on the same
       image
    - Some algorithms have more parallelization
       than others
Image Processing
-   Test script
     -   7 large-ish DSLR photos
     -   Cascade resizing each to 6 smaller sizes,
         semi-typical for Flickr’s workload
     -   Each resize processed serially
Image Processing
   compiler differences




(GM version 1.1.14, non-OpenMP)
Image Processing
          OpenMP differences


                    OpenMP
                    advantage




(gcc 4.1.2, on quad core Xeon L5335 @ 2.00GHz)
Image Processing
   CPU differences
Diagonal Scaling
Diagonal Scaling
-   Vertically scaling your already horizontally-
    scaled nodes
Diagonal Scaling
-   Vertically scaling your already horizontally-
    scaled nodes
-   a.k.a. “tech refresh”
Diagonal Scaling
-   Vertically scaling your already horizontally-
    scaled nodes
-   a.k.a. “tech refresh”
-   a.k.a. “Moore’s Law SurïŹng”
Diagonal Scaling
Diagonal Scaling
              67 “old” webservers with 18 “new” :
We replaced
Diagonal Scaling
              67 “old” webservers with 18 “new” :
We replaced


                CPUs         RAM         drives       total power (W)
servers                                                 @60% peak
               per server   per server   per server

   67              2                                    8763.6
                              4GB        1x80GB

   18              8                                    2332.8
                             4GB         1x146GB
Diagonal Scaling
              67 “old” webservers with 18 “new” :
We replaced


                CPUs         RAM         drives       total power (W)
servers                                                 @60% peak

              ~70% LESS power
               per server   per server   per server

   67           2                                       8763.6
                     4GB    1x80GB
              49U LESS rack space
   18              8                                    2332.8
                             4GB         1x146GB
Diagonal Scaling
Diagonal Scaling
              23 “old” image processing boxes with 8 “new”
We replaced
Diagonal Scaling
              23 “old” image processing boxes with 8 “new”
We replaced


   server        photos/min        rack        total power (W)
                                                 @60% peak



      23             1035           23          3008.4
      8              1120            8          1036.8
Diagonal Scaling
              23 “old” image processing boxes with 8 “new”
We replaced


   server        photos/min        rack        total power (W)
                                                 @60% peak

                  ~75% FASTER
      23            1035         23             3008.4
                  15U LESS rack space
                  65% LESS power 8
       8            1120                        1036.8
Diagonal Scaling
              23 “old” image processing boxes with 8 “new”
We replaced


   server        photos/min        rack        total power (W)

                  ~75% FASTER                    @60% peak



                  15U LESS rack space
      23            1035         23             3008.4
                  65% LESS power 8
       8            1120                        1036.8
Diagonal Scaling
              23 “old” image processing boxes with 8 “new”
We replaced


   server        photos/min        rack        total power (W)

                  ~75% FASTER                    @60% peak



                  15U LESS rack space
      23            1035         23             3008.4
                  65% LESS power 8
       8            1120                        1036.8
Diagonal Scaling
              23 “old” image processing boxes with 8 “new”
We replaced


   server        photos/min        rack        total power (W)

                  ~75% FASTER                    @60% peak



                  15U LESS rack space
      23            1035         23             3008.4
                  65% LESS power 8
       8            1120                        1036.8

        from this
                           to this
What do you do with
old/slow machines?
What do you do with
    old/slow machines?

-   Liquidate
What do you do with
    old/slow machines?

-   Liquidate
-   Re-purpose as dev/staging/etc
What do you do with
    old/slow machines?

-   Liquidate
-   Re-purpose as dev/staging/etc
-   “ofïŹ‚ine” tasks
OfïŹ‚ine Tasks
OfïŹ‚ine Tasks
-   Out-of-band/asynchronous queuing and execution
    system, for non-realtime tasks
OfïŹ‚ine Tasks
-   Out-of-band/asynchronous queuing and execution
    system, for non-realtime tasks
-   See here:
OfïŹ‚ine Tasks
-   Out-of-band/asynchronous queuing and execution
    system, for non-realtime tasks
-   See here:
    http://code.ïŹ‚ickr.com/blog/2008/09/26/ïŹ‚ickr-engineers-do-it-ofïŹ‚ine/
OfïŹ‚ine Tasks
-   Out-of-band/asynchronous queuing and execution
    system, for non-realtime tasks
-   See here:
    http://code.ïŹ‚ickr.com/blog/2008/09/26/ïŹ‚ickr-engineers-do-it-ofïŹ‚ine/

-   See Myles Grant talk about it more here:
OfïŹ‚ine Tasks
-   Out-of-band/asynchronous queuing and execution
    system, for non-realtime tasks
-   See here:
    http://code.ïŹ‚ickr.com/blog/2008/09/26/ïŹ‚ickr-engineers-do-it-ofïŹ‚ine/

-   See Myles Grant talk about it more here:
    http://en.oreilly.com/velocity2009/public/schedule/detail/7552
Runbook Hacks




“WTF HAPPENED LAST NIGHT?!”
Why?
Why?
As infrastructure grows, try to keep the
Humans:Machines ratio from getting out of
hand
Why?
As infrastructure grows, try to keep the
Humans:Machines ratio from getting out of
hand
Some of the How:
Why?
As infrastructure grows, try to keep the
Humans:Machines ratio from getting out of
hand
Some of the How:
  - teach machines to build themselves
Why?
As infrastructure grows, try to keep the
Humans:Machines ratio from getting out of
hand
Some of the How:
  - teach machines to build themselves
  - teach machines to watch themselves
Why?
As infrastructure grows, try to keep the
Humans:Machines ratio from getting out of
hand
Some of the How:
  - teach machines to build themselves
  - teach machines to watch themselves
  - teach machines to ïŹx themselves
Why?
As infrastructure grows, try to keep the
Humans:Machines ratio from getting out of
hand
Some of the How:
  -   teach machines to build themselves
  -   teach machines to watch themselves
  -   teach machines to ïŹx themselves
  -   reduce MTTR by streamlining
Automated
Infrastructure
Automated
          Infrastructure
-   If there is only one thing you do, automatic
    conïŹguration and deployment management
    should be it.
Automated
              Infrastructure
-   If there is only one thing you do, automatic
    conïŹguration and deployment management
    should be it.
-   See:
     -     Opscode/Chef (http://opscode.com/)

     -     Puppet (http://reductivelabs.com/products/puppet/)

     -     System Imager/ConïŹgurator
           (http://wiki.systemimager.org)
Conguration Management
     Codeswarm
Time
Machine time is cheaper than human time.


If a failure results in some commands being
run to â€˜ïŹx’ it, make the machines do it.


(i.e., don’t wake people up for stupid things!)
Aggregate Monitoring
Aggregate Monitoring




Don’t care about single nodes, only care about delta
change of metrics/faults
   -   Warn (email) on X % change
   -   Page (wake up) on Y % change
Aggregate Monitoring




Don’t care about single nodes, only care about delta
change of metrics/faults
   -   Warn (email) on X % change
   -   Page (wake up) on Y % change
High and low water marks for some metrics
Self-Healing
Self-Healing
Make service monitoring ïŹx common failure
scenarios, notify us later about it.
Self-Healing
Make service monitoring ïŹx common failure
scenarios, notify us later about it.
Self-Healing
Make service monitoring ïŹx common failure
scenarios, notify us later about it.


Daemons/processes run on machines, will take
corrective action under certain conditions, and
report back with what they did.
Self-Healing
Make service monitoring ïŹx common failure
scenarios, notify us later about it.


Daemons/processes run on machines, will take
corrective action under certain conditions, and
report back with what they did.
Self-Healing
Make service monitoring ïŹx common failure
scenarios, notify us later about it.


Daemons/processes run on machines, will take
corrective action under certain conditions, and
report back with what they did.


Can greatly reduce your mean time to recovery
(MTTR)
Self-Healing
Make service monitoring ïŹx common failure
scenarios, notify us later about it.


Daemons/processes run on machines, will take
corrective action under certain conditions, and
report back with what they did.


Can greatly reduce your mean time to recovery
(MTTR)
Basic Apache Example
Basic Apache Example

1. Webserver not running?
Basic Apache Example

1. Webserver not running?
2. Under certain conditions, try to start it, and
   email that this happened. (I’ll read it
   tomorrow)
Basic Apache Example

1. Webserver not running?
2. Under certain conditions, try to start it, and
   email that this happened. (I’ll read it
   tomorrow)
3. Won’t start? Assume something’s really
   wrong, so don’t keep trying (email that, too)
MySQL Self-Healing
MySQL Self-Healing
Some MySQL Issues â€œïŹxed” by the machines
MySQL Self-Healing
Some MySQL Issues â€œïŹxed” by the machines
MySQL Self-Healing
   Some MySQL Issues â€œïŹxed” by the machines


- Kill long-running SELECT queries (marked safe to kill)
MySQL Self-Healing
   Some MySQL Issues â€œïŹxed” by the machines


- Kill long-running SELECT queries (marked safe to kill)
- Queries not safe to kill are marked by the application
   as “NO KILL” in comments
MySQL Self-Healing
   Some MySQL Issues â€œïŹxed” by the machines


- Kill long-running SELECT queries (marked safe to kill)
- Queries not safe to kill are marked by the application
   as “NO KILL” in comments
- Run EXPLAIN on killed queries, and report the results
MySQL Self-Healing
   Some MySQL Issues â€œïŹxed” by the machines


- Kill long-running SELECT queries (marked safe to kill)
- Queries not safe to kill are marked by the application
   as “NO KILL” in comments
- Run EXPLAIN on killed queries, and report the results
- Keep track of the query types and databases that need
   the most killing, produce a “DBs that Suck” report
MySQL Self-Healing
MySQL Self-Healing
Some MySQL Replication issues â€œïŹxed” by the
machines, by error
MySQL Self-Healing
  Some MySQL Replication issues â€œïŹxed” by the
  machines, by error
- Skip errors that can safely be skipped and
  restart slave threads
MySQL Self-Healing
  Some MySQL Replication issues â€œïŹxed” by the
  machines, by error
- Skip errors that can safely be skipped and
  restart slave threads
- Force refetch of replication binlogs on:
       - 1064 (ER_PARSE_ERROR)
MySQL Self-Healing
  Some MySQL Replication issues â€œïŹxed” by the
  machines, by error
- Skip errors that can safely be skipped and
  restart slave threads
- Force refetch of replication binlogs on:
       - 1064 (ER_PARSE_ERROR)
- Re-run queries on:
       - 1205 (ER_LOCK_WAIT_TIMEOUT)
       - 1213 (ER_LOCK_DEADLOCK)
Troubleshooting
Code and ConïŹg
 Deploy Logs
Code and ConïŹg
 Deploy Logs

1. ESSENTIAL
Code and ConïŹg
 Deploy Logs

1. ESSENTIAL
2. MANDATORY
Communications
Communications
‱   Internal IRC

     -   For ongoing discussions
     -   Logged, so “inïŹnite” scrollback
Communications
‱   Internal IRC

      -   For ongoing discussions
      -   Logged, so “inïŹnite” scrollback
‱   IM Bot (built on libyahoo2.sf.net)

      -   For production changes
      -   Broadcasts all to all contacts
      -   Logged, and injected into IRC
      -   IM Status = who is in primary/secondary on-call
Communications
‱   Internal IRC

      -   For ongoing discussions
      -   Logged, so “inïŹnite” scrollback
‱   IM Bot (built on libyahoo2.sf.net)

      -   For production changes
      -   Broadcasts all to all contacts
      -   Logged, and injected into IRC
      -   IM Status = who is in primary/secondary on-call

‱   All of IRC and IM Bot slurped into a search index
when
when   what
when   what

              detailed
               what*
when                 what

                                                        detailed
                                                         what*




*also points to what commands should be used to back out the changes
when                 what

                                                        detailed
                                                         what*



                                          who




*also points to what commands should be used to back out the changes
when                 what

                                                        detailed
                                                         what*



                                          who




*also points to what commands should be used to back out the changes
when                 what

                                                        detailed
                                                         what*



                                          who
     time of last deploy at top of ganglia




*also points to what commands should be used to back out the changes
IM Bot (timestamps help correlation)
IM Bot (timestamps help correlation)
IM Bot (timestamps help correlation)




all IRC, IM
      bot
     into
searchable
  history
Morals of Our Stories
Morals of Our Stories

-   Optimizations can be a Very Good Thingℱ
Morals of Our Stories

-   Optimizations can be a Very Good Thingℱ
-   Weigh time spent optimizing against
    expected gains
Morals of Our Stories

-   Optimizations can be a Very Good Thingℱ
-   Weigh time spent optimizing against
    expected gains
-   Lean on others for how much “expected
    gains” mean for different scenarios
Morals of Our Stories

-   Optimizations can be a Very Good Thingℱ
-   Weigh time spent optimizing against
    expected gains
-   Lean on others for how much “expected
    gains” mean for different scenarios
-   Plain old-fashioned intuition
Some Wisdom Nuggets


     Jon Prall’s 85 WebOps Rules:
  http://jprall.vox.com/library/post/85-
    operations-rules-to-live-by.html
Questions?

http://www.ïŹ‚ickr.com/photos/ebarney/3348965637/
http://www.ïŹ‚ickr.com/photos/dgmiller/1606071911/
http://www.ïŹ‚ickr.com/photos/dannyboyster/60371673/
http://www.ïŹ‚ickr.com/photos/bright/189338394/
http://www.ïŹ‚ickr.com/photos/nickwheeleroz/2475011402/
http://www.ïŹ‚ickr.com/photos/dramaqueennorma/191063346/
http://www.ïŹ‚ickr.com/photos/telstar/2861103147/
http://www.ïŹ‚ickr.com/photos/norby/2309046043/
http://www.ïŹ‚ickr.com/photos/allysonk/201008992/

Weitere Àhnliche Inhalte

Was ist angesagt?

Oracle RAC 12c (12.1.0.2) Operational Best Practices - A result of true colla...
Oracle RAC 12c (12.1.0.2) Operational Best Practices - A result of true colla...Oracle RAC 12c (12.1.0.2) Operational Best Practices - A result of true colla...
Oracle RAC 12c (12.1.0.2) Operational Best Practices - A result of true colla...Markus Michalewicz
 
Sulu LE CMS Ultime
Sulu LE CMS UltimeSulu LE CMS Ultime
Sulu LE CMS UltimeJulien Vinber
 
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Ontico
 
MySQL InnoDB Cluster and Group Replication in a Nutshell
MySQL InnoDB Cluster and Group Replication in a NutshellMySQL InnoDB Cluster and Group Replication in a Nutshell
MySQL InnoDB Cluster and Group Replication in a NutshellFrederic Descamps
 
Oracle Database | Computer Science
Oracle Database | Computer ScienceOracle Database | Computer Science
Oracle Database | Computer ScienceTransweb Global Inc
 
しばちょう慈生がèȘžă‚‹ïŒă‚Șăƒ©ă‚Żăƒ«ăƒ‡ăƒŒă‚żăƒ™ăƒŒă‚čたé€ČćŒ–ăźæ­ŽćČăšæœ€æ–°æŠ€èĄ“ć‹•ć‘#1
しばちょう慈生がèȘžă‚‹ïŒă‚Șăƒ©ă‚Żăƒ«ăƒ‡ăƒŒă‚żăƒ™ăƒŒă‚čたé€ČćŒ–ăźæ­ŽćČăšæœ€æ–°æŠ€èĄ“ć‹•ć‘#1しばちょう慈生がèȘžă‚‹ïŒă‚Șăƒ©ă‚Żăƒ«ăƒ‡ăƒŒă‚żăƒ™ăƒŒă‚čたé€ČćŒ–ăźæ­ŽćČăšæœ€æ–°æŠ€èĄ“ć‹•ć‘#1
しばちょう慈生がèȘžă‚‹ïŒă‚Șăƒ©ă‚Żăƒ«ăƒ‡ăƒŒă‚żăƒ™ăƒŒă‚čたé€ČćŒ–ăźæ­ŽćČăšæœ€æ–°æŠ€èĄ“ć‹•ć‘#1ă‚Șăƒ©ă‚Żăƒ«ă‚šăƒłă‚žăƒ‹ă‚ąé€šäżĄ
 
MySQL Database Architectures - 2020-10
MySQL Database Architectures -  2020-10MySQL Database Architectures -  2020-10
MySQL Database Architectures - 2020-10Kenny Gryp
 
MySQL8.0 in COSCUP2017
MySQL8.0 in COSCUP2017MySQL8.0 in COSCUP2017
MySQL8.0 in COSCUP2017Shinya Sugiyama
 
MySQL Cookbook: Recipes for Your Business
MySQL Cookbook: Recipes for Your BusinessMySQL Cookbook: Recipes for Your Business
MySQL Cookbook: Recipes for Your BusinessSveta Smirnova
 
MySQL Technology Cafe #12 MDS HA怜蚌 ïœžăƒ‘ăƒ©ăƒĄăƒŒă‚żă‹ă‚‰ăƒ‘ăƒ•ă‚©ăƒŒăƒžăƒłă‚čăŸă§ïœž
MySQL Technology Cafe #12 MDS HA怜蚌 ïœžăƒ‘ăƒ©ăƒĄăƒŒă‚żă‹ă‚‰ăƒ‘ăƒ•ă‚©ăƒŒăƒžăƒłă‚čăŸă§ïœžMySQL Technology Cafe #12 MDS HA怜蚌 ïœžăƒ‘ăƒ©ăƒĄăƒŒă‚żă‹ă‚‰ăƒ‘ăƒ•ă‚©ăƒŒăƒžăƒłă‚čăŸă§ïœž
MySQL Technology Cafe #12 MDS HA怜蚌 ïœžăƒ‘ăƒ©ăƒĄăƒŒă‚żă‹ă‚‰ăƒ‘ăƒ•ă‚©ăƒŒăƒžăƒłă‚čăŸă§ïœžă‚Șăƒ©ă‚Żăƒ«ă‚šăƒłă‚žăƒ‹ă‚ąé€šäżĄ
 
Introduction to Ansible
Introduction to AnsibleIntroduction to Ansible
Introduction to AnsibleKnoldus Inc.
 
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And WhatPerformance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And Whatudaymoogala
 
New availability features in oracle rac 12c release 2 anair ss
New availability features in oracle rac 12c release 2 anair   ssNew availability features in oracle rac 12c release 2 anair   ss
New availability features in oracle rac 12c release 2 anair ssAnil Nair
 
찟아가는 AWS ì„žëŻžë‚˜(ê”ŹëĄœ,가산,판ꔐ) - ë©”ê°€ìĄŽêłŒ 핚께하는 큎띌우드 컎퓚팅
찟아가는 AWS ì„žëŻžë‚˜(ê”ŹëĄœ,가산,판ꔐ) - ë©”ê°€ìĄŽêłŒ 핚께하는 큎띌우드 컎퓚팅찟아가는 AWS ì„žëŻžë‚˜(ê”ŹëĄœ,가산,판ꔐ) - ë©”ê°€ìĄŽêłŒ 핚께하는 큎띌우드 컎퓚팅
찟아가는 AWS ì„žëŻžë‚˜(ê”ŹëĄœ,가산,판ꔐ) - ë©”ê°€ìĄŽêłŒ 핚께하는 큎띌우드 컎퓚팅Amazon Web Services Korea
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 
Real Application Testing
Real Application TestingReal Application Testing
Real Application Testingoracleonthebrain
 
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOAltinity Ltd
 

Was ist angesagt? (20)

Oracle RAC 12c (12.1.0.2) Operational Best Practices - A result of true colla...
Oracle RAC 12c (12.1.0.2) Operational Best Practices - A result of true colla...Oracle RAC 12c (12.1.0.2) Operational Best Practices - A result of true colla...
Oracle RAC 12c (12.1.0.2) Operational Best Practices - A result of true colla...
 
Sulu LE CMS Ultime
Sulu LE CMS UltimeSulu LE CMS Ultime
Sulu LE CMS Ultime
 
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
 
Planning for Disaster Recovery (DR) with Galera Cluster
Planning for Disaster Recovery (DR) with Galera ClusterPlanning for Disaster Recovery (DR) with Galera Cluster
Planning for Disaster Recovery (DR) with Galera Cluster
 
MySQL InnoDB Cluster and Group Replication in a Nutshell
MySQL InnoDB Cluster and Group Replication in a NutshellMySQL InnoDB Cluster and Group Replication in a Nutshell
MySQL InnoDB Cluster and Group Replication in a Nutshell
 
Oracle Database | Computer Science
Oracle Database | Computer ScienceOracle Database | Computer Science
Oracle Database | Computer Science
 
しばちょう慈生がèȘžă‚‹ïŒă‚Șăƒ©ă‚Żăƒ«ăƒ‡ăƒŒă‚żăƒ™ăƒŒă‚čたé€ČćŒ–ăźæ­ŽćČăšæœ€æ–°æŠ€èĄ“ć‹•ć‘#1
しばちょう慈生がèȘžă‚‹ïŒă‚Șăƒ©ă‚Żăƒ«ăƒ‡ăƒŒă‚żăƒ™ăƒŒă‚čたé€ČćŒ–ăźæ­ŽćČăšæœ€æ–°æŠ€èĄ“ć‹•ć‘#1しばちょう慈生がèȘžă‚‹ïŒă‚Șăƒ©ă‚Żăƒ«ăƒ‡ăƒŒă‚żăƒ™ăƒŒă‚čたé€ČćŒ–ăźæ­ŽćČăšæœ€æ–°æŠ€èĄ“ć‹•ć‘#1
しばちょう慈生がèȘžă‚‹ïŒă‚Șăƒ©ă‚Żăƒ«ăƒ‡ăƒŒă‚żăƒ™ăƒŒă‚čたé€ČćŒ–ăźæ­ŽćČăšæœ€æ–°æŠ€èĄ“ć‹•ć‘#1
 
Esper - CEP Engine
Esper - CEP EngineEsper - CEP Engine
Esper - CEP Engine
 
MySQL Database Architectures - 2020-10
MySQL Database Architectures -  2020-10MySQL Database Architectures -  2020-10
MySQL Database Architectures - 2020-10
 
MySQL8.0 in COSCUP2017
MySQL8.0 in COSCUP2017MySQL8.0 in COSCUP2017
MySQL8.0 in COSCUP2017
 
MySQL Cookbook: Recipes for Your Business
MySQL Cookbook: Recipes for Your BusinessMySQL Cookbook: Recipes for Your Business
MySQL Cookbook: Recipes for Your Business
 
MySQL Technology Cafe #12 MDS HA怜蚌 ïœžăƒ‘ăƒ©ăƒĄăƒŒă‚żă‹ă‚‰ăƒ‘ăƒ•ă‚©ăƒŒăƒžăƒłă‚čăŸă§ïœž
MySQL Technology Cafe #12 MDS HA怜蚌 ïœžăƒ‘ăƒ©ăƒĄăƒŒă‚żă‹ă‚‰ăƒ‘ăƒ•ă‚©ăƒŒăƒžăƒłă‚čăŸă§ïœžMySQL Technology Cafe #12 MDS HA怜蚌 ïœžăƒ‘ăƒ©ăƒĄăƒŒă‚żă‹ă‚‰ăƒ‘ăƒ•ă‚©ăƒŒăƒžăƒłă‚čăŸă§ïœž
MySQL Technology Cafe #12 MDS HA怜蚌 ïœžăƒ‘ăƒ©ăƒĄăƒŒă‚żă‹ă‚‰ăƒ‘ăƒ•ă‚©ăƒŒăƒžăƒłă‚čăŸă§ïœž
 
Introduction to Ansible
Introduction to AnsibleIntroduction to Ansible
Introduction to Ansible
 
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And WhatPerformance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
 
New availability features in oracle rac 12c release 2 anair ss
New availability features in oracle rac 12c release 2 anair   ssNew availability features in oracle rac 12c release 2 anair   ss
New availability features in oracle rac 12c release 2 anair ss
 
찟아가는 AWS ì„žëŻžë‚˜(ê”ŹëĄœ,가산,판ꔐ) - ë©”ê°€ìĄŽêłŒ 핚께하는 큎띌우드 컎퓚팅
찟아가는 AWS ì„žëŻžë‚˜(ê”ŹëĄœ,가산,판ꔐ) - ë©”ê°€ìĄŽêłŒ 핚께하는 큎띌우드 컎퓚팅찟아가는 AWS ì„žëŻžë‚˜(ê”ŹëĄœ,가산,판ꔐ) - ë©”ê°€ìĄŽêłŒ 핚께하는 큎띌우드 컎퓚팅
찟아가는 AWS ì„žëŻžë‚˜(ê”ŹëĄœ,가산,판ꔐ) - ë©”ê°€ìĄŽêłŒ 핚께하는 큎띌우드 컎퓚팅
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 
Real Application Testing
Real Application TestingReal Application Testing
Real Application Testing
 
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
 
Ansible
AnsibleAnsible
Ansible
 

Andere mochten auch

Awareness, Feedback, Self-regulation
Awareness, Feedback, Self-regulationAwareness, Feedback, Self-regulation
Awareness, Feedback, Self-regulationChristian Glahn
 
Go or No-Go: Operability and Contingency Planning at Etsy.com
Go or No-Go: Operability and Contingency Planning at Etsy.comGo or No-Go: Operability and Contingency Planning at Etsy.com
Go or No-Go: Operability and Contingency Planning at Etsy.comJohn Allspaw
 
Programação semana de extensão 2012 folia
Programação semana de extensão 2012 foliaProgramação semana de extensão 2012 folia
Programação semana de extensão 2012 foliaAdriana Rocha
 
Reconociendo habilidades
Reconociendo habilidadesReconociendo habilidades
Reconociendo habilidadescarorubi1
 
Saludo protocolar al cuerpo diplomĂĄtico acreditado en panamĂĄ
Saludo protocolar al cuerpo diplomĂĄtico acreditado en panamĂĄSaludo protocolar al cuerpo diplomĂĄtico acreditado en panamĂĄ
Saludo protocolar al cuerpo diplomĂĄtico acreditado en panamĂĄmirepanama
 
Pelleting: The link between practice and engineering
Pelleting: The link between practice and engineeringPelleting: The link between practice and engineering
Pelleting: The link between practice and engineeringMilling and Grain magazine
 
Annual Subsea Meeting Registration Form
Annual Subsea Meeting Registration FormAnnual Subsea Meeting Registration Form
Annual Subsea Meeting Registration FormPaulo de Tarso Molina
 
Strategies for Vacant Properties - John Mills, Camelot Property Management
Strategies for Vacant Properties - John Mills, Camelot Property ManagementStrategies for Vacant Properties - John Mills, Camelot Property Management
Strategies for Vacant Properties - John Mills, Camelot Property Managementfhanley
 
FLOW - Far and Large Offshore Wind (Summary)
FLOW - Far and Large Offshore Wind (Summary)FLOW - Far and Large Offshore Wind (Summary)
FLOW - Far and Large Offshore Wind (Summary)NLandUSA
 
B2b newsletter_april
B2b newsletter_aprilB2b newsletter_april
B2b newsletter_aprilRene Jack
 
Cyclus-US Portfolio
Cyclus-US PortfolioCyclus-US Portfolio
Cyclus-US PortfolioXDS Marketing.
 
La gestiĂłn de la elecciĂłn gd e e mail
La gestiĂłn de la elecciĂłn gd e e mailLa gestiĂłn de la elecciĂłn gd e e mail
La gestiĂłn de la elecciĂłn gd e e mailmsanchezm
 
Aguahidratacion cuso4b
Aguahidratacion cuso4bAguahidratacion cuso4b
Aguahidratacion cuso4bJ M
 
Final.gest.hum.1
Final.gest.hum.1Final.gest.hum.1
Final.gest.hum.1edwin montero
 

Andere mochten auch (20)

Awareness, Feedback, Self-regulation
Awareness, Feedback, Self-regulationAwareness, Feedback, Self-regulation
Awareness, Feedback, Self-regulation
 
Go or No-Go: Operability and Contingency Planning at Etsy.com
Go or No-Go: Operability and Contingency Planning at Etsy.comGo or No-Go: Operability and Contingency Planning at Etsy.com
Go or No-Go: Operability and Contingency Planning at Etsy.com
 
Programação semana de extensão 2012 folia
Programação semana de extensão 2012 foliaProgramação semana de extensão 2012 folia
Programação semana de extensão 2012 folia
 
Yoga ocular
Yoga ocularYoga ocular
Yoga ocular
 
Reconociendo habilidades
Reconociendo habilidadesReconociendo habilidades
Reconociendo habilidades
 
Saludo protocolar al cuerpo diplomĂĄtico acreditado en panamĂĄ
Saludo protocolar al cuerpo diplomĂĄtico acreditado en panamĂĄSaludo protocolar al cuerpo diplomĂĄtico acreditado en panamĂĄ
Saludo protocolar al cuerpo diplomĂĄtico acreditado en panamĂĄ
 
Pelleting: The link between practice and engineering
Pelleting: The link between practice and engineeringPelleting: The link between practice and engineering
Pelleting: The link between practice and engineering
 
Annual Subsea Meeting Registration Form
Annual Subsea Meeting Registration FormAnnual Subsea Meeting Registration Form
Annual Subsea Meeting Registration Form
 
Strategies for Vacant Properties - John Mills, Camelot Property Management
Strategies for Vacant Properties - John Mills, Camelot Property ManagementStrategies for Vacant Properties - John Mills, Camelot Property Management
Strategies for Vacant Properties - John Mills, Camelot Property Management
 
EBT Health & Safety
EBT Health & SafetyEBT Health & Safety
EBT Health & Safety
 
Pulpo congelado en españa
Pulpo congelado en españaPulpo congelado en españa
Pulpo congelado en españa
 
FLOW - Far and Large Offshore Wind (Summary)
FLOW - Far and Large Offshore Wind (Summary)FLOW - Far and Large Offshore Wind (Summary)
FLOW - Far and Large Offshore Wind (Summary)
 
B2b newsletter_april
B2b newsletter_aprilB2b newsletter_april
B2b newsletter_april
 
Cardiodiagnostico
CardiodiagnosticoCardiodiagnostico
Cardiodiagnostico
 
Cyclus-US Portfolio
Cyclus-US PortfolioCyclus-US Portfolio
Cyclus-US Portfolio
 
La gestiĂłn de la elecciĂłn gd e e mail
La gestiĂłn de la elecciĂłn gd e e mailLa gestiĂłn de la elecciĂłn gd e e mail
La gestiĂłn de la elecciĂłn gd e e mail
 
Aguahidratacion cuso4b
Aguahidratacion cuso4bAguahidratacion cuso4b
Aguahidratacion cuso4b
 
Final.gest.hum.1
Final.gest.hum.1Final.gest.hum.1
Final.gest.hum.1
 
Ba38
Ba38Ba38
Ba38
 
Numero
NumeroNumero
Numero
 

Ähnlich wie Operational Efficiency Hacks Web20 Expo2009

Capacity Management for Web Operations
Capacity Management for Web OperationsCapacity Management for Web Operations
Capacity Management for Web OperationsJohn Allspaw
 
CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016Belmiro Moreira
 
How to build a state-of-the-art rails cluster
How to build a state-of-the-art rails clusterHow to build a state-of-the-art rails cluster
How to build a state-of-the-art rails clusterTim Lossen
 
Enterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up SearchEnterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up SearchAzul Systems Inc.
 
MySQL Group Replication - Ready For Production? (2018-04)
MySQL Group Replication - Ready For Production? (2018-04)MySQL Group Replication - Ready For Production? (2018-04)
MySQL Group Replication - Ready For Production? (2018-04)Kenny Gryp
 
Capacity Management from Flickr
Capacity Management from FlickrCapacity Management from Flickr
Capacity Management from Flickrxlight
 
Capacity Planning For Web Operations Presentation
Capacity Planning For Web Operations PresentationCapacity Planning For Web Operations Presentation
Capacity Planning For Web Operations Presentationjward5519
 
Capacity Planning For Web Operations Presentation
Capacity Planning For Web Operations PresentationCapacity Planning For Web Operations Presentation
Capacity Planning For Web Operations Presentationjward5519
 
Apache Gearpump - Lightweight Real-time Streaming Engine
Apache Gearpump - Lightweight Real-time Streaming EngineApache Gearpump - Lightweight Real-time Streaming Engine
Apache Gearpump - Lightweight Real-time Streaming EngineTianlun Zhang
 
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?ScyllaDB
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on CephCeph Community
 
Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentationbr7tt
 
Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentationbr7tt
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
Couchbase live 2016
Couchbase live 2016Couchbase live 2016
Couchbase live 2016Pierre Mavro
 
Weakly Supervised Whole Slide Image Analysis Using Cloud Computing
Weakly Supervised Whole Slide Image Analysis Using Cloud ComputingWeakly Supervised Whole Slide Image Analysis Using Cloud Computing
Weakly Supervised Whole Slide Image Analysis Using Cloud ComputingSean Yu
 
Smashing Big Data with AHA Hardware GZIP
Smashing Big Data with AHA Hardware GZIPSmashing Big Data with AHA Hardware GZIP
Smashing Big Data with AHA Hardware GZIPJuan D. Deaton, Ph.D.
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey J On The Beach
 

Ähnlich wie Operational Efficiency Hacks Web20 Expo2009 (20)

Capacity Management for Web Operations
Capacity Management for Web OperationsCapacity Management for Web Operations
Capacity Management for Web Operations
 
CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016
 
How to build a state-of-the-art rails cluster
How to build a state-of-the-art rails clusterHow to build a state-of-the-art rails cluster
How to build a state-of-the-art rails cluster
 
Enterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up SearchEnterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up Search
 
MySQL Group Replication - Ready For Production? (2018-04)
MySQL Group Replication - Ready For Production? (2018-04)MySQL Group Replication - Ready For Production? (2018-04)
MySQL Group Replication - Ready For Production? (2018-04)
 
Capacity Management from Flickr
Capacity Management from FlickrCapacity Management from Flickr
Capacity Management from Flickr
 
Capacity Planning For Web Operations Presentation
Capacity Planning For Web Operations PresentationCapacity Planning For Web Operations Presentation
Capacity Planning For Web Operations Presentation
 
Capacity Planning For Web Operations Presentation
Capacity Planning For Web Operations PresentationCapacity Planning For Web Operations Presentation
Capacity Planning For Web Operations Presentation
 
Apache Gearpump - Lightweight Real-time Streaming Engine
Apache Gearpump - Lightweight Real-time Streaming EngineApache Gearpump - Lightweight Real-time Streaming Engine
Apache Gearpump - Lightweight Real-time Streaming Engine
 
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph
 
Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentation
 
Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentation
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Couchbase live 2016
Couchbase live 2016Couchbase live 2016
Couchbase live 2016
 
Weakly Supervised Whole Slide Image Analysis Using Cloud Computing
Weakly Supervised Whole Slide Image Analysis Using Cloud ComputingWeakly Supervised Whole Slide Image Analysis Using Cloud Computing
Weakly Supervised Whole Slide Image Analysis Using Cloud Computing
 
Smashing Big Data with AHA Hardware GZIP
Smashing Big Data with AHA Hardware GZIPSmashing Big Data with AHA Hardware GZIP
Smashing Big Data with AHA Hardware GZIP
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 

Mehr von John Allspaw

Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...John Allspaw
 
Considerations for Alert Design
Considerations for Alert DesignConsiderations for Alert Design
Considerations for Alert DesignJohn Allspaw
 
Velocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls
Velocity EU 2012 Escalating Scenarios: Outage Handling PitfallsVelocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls
Velocity EU 2012 Escalating Scenarios: Outage Handling PitfallsJohn Allspaw
 
Responding to Outages Maturely
Responding to Outages MaturelyResponding to Outages Maturely
Responding to Outages MaturelyJohn Allspaw
 
Resilient Response In Complex Systems
Resilient Response In Complex SystemsResilient Response In Complex Systems
Resilient Response In Complex SystemsJohn Allspaw
 
Outages, PostMortems, and Human Error
Outages, PostMortems, and Human ErrorOutages, PostMortems, and Human Error
Outages, PostMortems, and Human ErrorJohn Allspaw
 
Anticipation: What Could Possibly Go Wrong?
Anticipation: What Could Possibly Go Wrong?Anticipation: What Could Possibly Go Wrong?
Anticipation: What Could Possibly Go Wrong?John Allspaw
 
Advanced PostMortem Fu and Human Error 101 (Velocity 2011)
Advanced PostMortem Fu and Human Error 101 (Velocity 2011)Advanced PostMortem Fu and Human Error 101 (Velocity 2011)
Advanced PostMortem Fu and Human Error 101 (Velocity 2011)John Allspaw
 
Dev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrDev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrJohn Allspaw
 
Ops Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For ChangeOps Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For ChangeJohn Allspaw
 
Ops Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For ChangeOps Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For ChangeJohn Allspaw
 
Capacity Planning For LAMP
Capacity Planning For LAMPCapacity Planning For LAMP
Capacity Planning For LAMPJohn Allspaw
 
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at FlickrJohn Allspaw
 
Capacity Planning for Web Operations - Web20 Expo 2008
Capacity Planning for Web Operations - Web20 Expo 2008Capacity Planning for Web Operations - Web20 Expo 2008
Capacity Planning for Web Operations - Web20 Expo 2008John Allspaw
 

Mehr von John Allspaw (14)

Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...
 
Considerations for Alert Design
Considerations for Alert DesignConsiderations for Alert Design
Considerations for Alert Design
 
Velocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls
Velocity EU 2012 Escalating Scenarios: Outage Handling PitfallsVelocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls
Velocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls
 
Responding to Outages Maturely
Responding to Outages MaturelyResponding to Outages Maturely
Responding to Outages Maturely
 
Resilient Response In Complex Systems
Resilient Response In Complex SystemsResilient Response In Complex Systems
Resilient Response In Complex Systems
 
Outages, PostMortems, and Human Error
Outages, PostMortems, and Human ErrorOutages, PostMortems, and Human Error
Outages, PostMortems, and Human Error
 
Anticipation: What Could Possibly Go Wrong?
Anticipation: What Could Possibly Go Wrong?Anticipation: What Could Possibly Go Wrong?
Anticipation: What Could Possibly Go Wrong?
 
Advanced PostMortem Fu and Human Error 101 (Velocity 2011)
Advanced PostMortem Fu and Human Error 101 (Velocity 2011)Advanced PostMortem Fu and Human Error 101 (Velocity 2011)
Advanced PostMortem Fu and Human Error 101 (Velocity 2011)
 
Dev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrDev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and Flickr
 
Ops Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For ChangeOps Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For Change
 
Ops Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For ChangeOps Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For Change
 
Capacity Planning For LAMP
Capacity Planning For LAMPCapacity Planning For LAMP
Capacity Planning For LAMP
 
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
 
Capacity Planning for Web Operations - Web20 Expo 2008
Capacity Planning for Web Operations - Web20 Expo 2008Capacity Planning for Web Operations - Web20 Expo 2008
Capacity Planning for Web Operations - Web20 Expo 2008
 

KĂŒrzlich hochgeladen

Islamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad Escorts
Islamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad EscortsIslamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad Escorts
Islamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad Escortswdefrd
 
Deira Call Girls # 0522916705 # Call Girls In Deira Dubai || (UAE)
Deira Call Girls # 0522916705 #  Call Girls In Deira Dubai || (UAE)Deira Call Girls # 0522916705 #  Call Girls In Deira Dubai || (UAE)
Deira Call Girls # 0522916705 # Call Girls In Deira Dubai || (UAE)wdefrd
 
Editorial sephora annual report design project
Editorial sephora annual report design projectEditorial sephora annual report design project
Editorial sephora annual report design projecttbatkhuu1
 
FULL ENJOY - 9953040155 Call Girls in Wazirabad | Delhi
FULL ENJOY - 9953040155 Call Girls in Wazirabad | DelhiFULL ENJOY - 9953040155 Call Girls in Wazirabad | Delhi
FULL ENJOY - 9953040155 Call Girls in Wazirabad | DelhiMalviyaNagarCallGirl
 
Turn Lock Take Key Storyboard Daniel Johnson
Turn Lock Take Key Storyboard Daniel JohnsonTurn Lock Take Key Storyboard Daniel Johnson
Turn Lock Take Key Storyboard Daniel Johnsonthephillipta
 
FULL ENJOY - 9953040155 Call Girls in Sangam Vihar | Delhi
FULL ENJOY - 9953040155 Call Girls in Sangam Vihar | DelhiFULL ENJOY - 9953040155 Call Girls in Sangam Vihar | Delhi
FULL ENJOY - 9953040155 Call Girls in Sangam Vihar | DelhiMalviyaNagarCallGirl
 
RAK Call Girls Service # 971559085003 # Call Girl Service In RAK
RAK Call Girls Service # 971559085003 # Call Girl Service In RAKRAK Call Girls Service # 971559085003 # Call Girl Service In RAK
RAK Call Girls Service # 971559085003 # Call Girl Service In RAKedwardsara83
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Vasant Kunj | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Vasant Kunj | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Vasant Kunj | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Vasant Kunj | Delhisoniya singh
 
Lucknow 💋 Russian Call Girls Lucknow | Whatsapp No 8923113531 VIP Escorts Ser...
Lucknow 💋 Russian Call Girls Lucknow | Whatsapp No 8923113531 VIP Escorts Ser...Lucknow 💋 Russian Call Girls Lucknow | Whatsapp No 8923113531 VIP Escorts Ser...
Lucknow 💋 Russian Call Girls Lucknow | Whatsapp No 8923113531 VIP Escorts Ser...anilsa9823
 
Indira Nagar Lucknow #Call Girls Lucknow â‚č7.5k Pick Up & Drop With Cash Payme...
Indira Nagar Lucknow #Call Girls Lucknow â‚č7.5k Pick Up & Drop With Cash Payme...Indira Nagar Lucknow #Call Girls Lucknow â‚č7.5k Pick Up & Drop With Cash Payme...
Indira Nagar Lucknow #Call Girls Lucknow â‚č7.5k Pick Up & Drop With Cash Payme...akbard9823
 
FULL ENJOY - 9953040155 Call Girls in Shahdara | Delhi
FULL ENJOY - 9953040155 Call Girls in Shahdara | DelhiFULL ENJOY - 9953040155 Call Girls in Shahdara | Delhi
FULL ENJOY - 9953040155 Call Girls in Shahdara | DelhiMalviyaNagarCallGirl
 
Bridge Fight Board by Daniel Johnson dtjohnsonart.com
Bridge Fight Board by Daniel Johnson dtjohnsonart.comBridge Fight Board by Daniel Johnson dtjohnsonart.com
Bridge Fight Board by Daniel Johnson dtjohnsonart.comthephillipta
 
FULL ENJOY - 9953040155 Call Girls in Burari | Delhi
FULL ENJOY - 9953040155 Call Girls in Burari | DelhiFULL ENJOY - 9953040155 Call Girls in Burari | Delhi
FULL ENJOY - 9953040155 Call Girls in Burari | DelhiMalviyaNagarCallGirl
 
Charbagh / best call girls in Lucknow - Book đŸ„€ 8923113531 đŸȘ— Call Girls Availa...
Charbagh / best call girls in Lucknow - Book đŸ„€ 8923113531 đŸȘ— Call Girls Availa...Charbagh / best call girls in Lucknow - Book đŸ„€ 8923113531 đŸȘ— Call Girls Availa...
Charbagh / best call girls in Lucknow - Book đŸ„€ 8923113531 đŸȘ— Call Girls Availa...gurkirankumar98700
 
FULL ENJOY - 9953040155 Call Girls in Shaheen Bagh | Delhi
FULL ENJOY - 9953040155 Call Girls in Shaheen Bagh | DelhiFULL ENJOY - 9953040155 Call Girls in Shaheen Bagh | Delhi
FULL ENJOY - 9953040155 Call Girls in Shaheen Bagh | DelhiMalviyaNagarCallGirl
 
FULL ENJOY - 9953040155 Call Girls in Indirapuram | Delhi
FULL ENJOY - 9953040155 Call Girls in Indirapuram | DelhiFULL ENJOY - 9953040155 Call Girls in Indirapuram | Delhi
FULL ENJOY - 9953040155 Call Girls in Indirapuram | DelhiMalviyaNagarCallGirl
 
The First Date by Daniel Johnson (Inspired By True Events)
The First Date by Daniel Johnson (Inspired By True Events)The First Date by Daniel Johnson (Inspired By True Events)
The First Date by Daniel Johnson (Inspired By True Events)thephillipta
 
Hazratganj / Call Girl in Lucknow - Phone đŸ«— 8923113531 ☛ Escorts Service at 6...
Hazratganj / Call Girl in Lucknow - Phone đŸ«— 8923113531 ☛ Escorts Service at 6...Hazratganj / Call Girl in Lucknow - Phone đŸ«— 8923113531 ☛ Escorts Service at 6...
Hazratganj / Call Girl in Lucknow - Phone đŸ«— 8923113531 ☛ Escorts Service at 6...akbard9823
 
Young⚡Call Girls in Lajpat Nagar Delhi >àŒ’9667401043 Escort Service
Young⚡Call Girls in Lajpat Nagar Delhi >àŒ’9667401043 Escort ServiceYoung⚡Call Girls in Lajpat Nagar Delhi >àŒ’9667401043 Escort Service
Young⚡Call Girls in Lajpat Nagar Delhi >àŒ’9667401043 Escort Servicesonnydelhi1992
 
Call girls in Kanpur - 9761072362 with room service
Call girls in Kanpur - 9761072362 with room serviceCall girls in Kanpur - 9761072362 with room service
Call girls in Kanpur - 9761072362 with room servicediscovermytutordmt
 

KĂŒrzlich hochgeladen (20)

Islamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad Escorts
Islamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad EscortsIslamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad Escorts
Islamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad Escorts
 
Deira Call Girls # 0522916705 # Call Girls In Deira Dubai || (UAE)
Deira Call Girls # 0522916705 #  Call Girls In Deira Dubai || (UAE)Deira Call Girls # 0522916705 #  Call Girls In Deira Dubai || (UAE)
Deira Call Girls # 0522916705 # Call Girls In Deira Dubai || (UAE)
 
Editorial sephora annual report design project
Editorial sephora annual report design projectEditorial sephora annual report design project
Editorial sephora annual report design project
 
FULL ENJOY - 9953040155 Call Girls in Wazirabad | Delhi
FULL ENJOY - 9953040155 Call Girls in Wazirabad | DelhiFULL ENJOY - 9953040155 Call Girls in Wazirabad | Delhi
FULL ENJOY - 9953040155 Call Girls in Wazirabad | Delhi
 
Turn Lock Take Key Storyboard Daniel Johnson
Turn Lock Take Key Storyboard Daniel JohnsonTurn Lock Take Key Storyboard Daniel Johnson
Turn Lock Take Key Storyboard Daniel Johnson
 
FULL ENJOY - 9953040155 Call Girls in Sangam Vihar | Delhi
FULL ENJOY - 9953040155 Call Girls in Sangam Vihar | DelhiFULL ENJOY - 9953040155 Call Girls in Sangam Vihar | Delhi
FULL ENJOY - 9953040155 Call Girls in Sangam Vihar | Delhi
 
RAK Call Girls Service # 971559085003 # Call Girl Service In RAK
RAK Call Girls Service # 971559085003 # Call Girl Service In RAKRAK Call Girls Service # 971559085003 # Call Girl Service In RAK
RAK Call Girls Service # 971559085003 # Call Girl Service In RAK
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Vasant Kunj | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Vasant Kunj | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Vasant Kunj | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Vasant Kunj | Delhi
 
Lucknow 💋 Russian Call Girls Lucknow | Whatsapp No 8923113531 VIP Escorts Ser...
Lucknow 💋 Russian Call Girls Lucknow | Whatsapp No 8923113531 VIP Escorts Ser...Lucknow 💋 Russian Call Girls Lucknow | Whatsapp No 8923113531 VIP Escorts Ser...
Lucknow 💋 Russian Call Girls Lucknow | Whatsapp No 8923113531 VIP Escorts Ser...
 
Indira Nagar Lucknow #Call Girls Lucknow â‚č7.5k Pick Up & Drop With Cash Payme...
Indira Nagar Lucknow #Call Girls Lucknow â‚č7.5k Pick Up & Drop With Cash Payme...Indira Nagar Lucknow #Call Girls Lucknow â‚č7.5k Pick Up & Drop With Cash Payme...
Indira Nagar Lucknow #Call Girls Lucknow â‚č7.5k Pick Up & Drop With Cash Payme...
 
FULL ENJOY - 9953040155 Call Girls in Shahdara | Delhi
FULL ENJOY - 9953040155 Call Girls in Shahdara | DelhiFULL ENJOY - 9953040155 Call Girls in Shahdara | Delhi
FULL ENJOY - 9953040155 Call Girls in Shahdara | Delhi
 
Bridge Fight Board by Daniel Johnson dtjohnsonart.com
Bridge Fight Board by Daniel Johnson dtjohnsonart.comBridge Fight Board by Daniel Johnson dtjohnsonart.com
Bridge Fight Board by Daniel Johnson dtjohnsonart.com
 
FULL ENJOY - 9953040155 Call Girls in Burari | Delhi
FULL ENJOY - 9953040155 Call Girls in Burari | DelhiFULL ENJOY - 9953040155 Call Girls in Burari | Delhi
FULL ENJOY - 9953040155 Call Girls in Burari | Delhi
 
Charbagh / best call girls in Lucknow - Book đŸ„€ 8923113531 đŸȘ— Call Girls Availa...
Charbagh / best call girls in Lucknow - Book đŸ„€ 8923113531 đŸȘ— Call Girls Availa...Charbagh / best call girls in Lucknow - Book đŸ„€ 8923113531 đŸȘ— Call Girls Availa...
Charbagh / best call girls in Lucknow - Book đŸ„€ 8923113531 đŸȘ— Call Girls Availa...
 
FULL ENJOY - 9953040155 Call Girls in Shaheen Bagh | Delhi
FULL ENJOY - 9953040155 Call Girls in Shaheen Bagh | DelhiFULL ENJOY - 9953040155 Call Girls in Shaheen Bagh | Delhi
FULL ENJOY - 9953040155 Call Girls in Shaheen Bagh | Delhi
 
FULL ENJOY - 9953040155 Call Girls in Indirapuram | Delhi
FULL ENJOY - 9953040155 Call Girls in Indirapuram | DelhiFULL ENJOY - 9953040155 Call Girls in Indirapuram | Delhi
FULL ENJOY - 9953040155 Call Girls in Indirapuram | Delhi
 
The First Date by Daniel Johnson (Inspired By True Events)
The First Date by Daniel Johnson (Inspired By True Events)The First Date by Daniel Johnson (Inspired By True Events)
The First Date by Daniel Johnson (Inspired By True Events)
 
Hazratganj / Call Girl in Lucknow - Phone đŸ«— 8923113531 ☛ Escorts Service at 6...
Hazratganj / Call Girl in Lucknow - Phone đŸ«— 8923113531 ☛ Escorts Service at 6...Hazratganj / Call Girl in Lucknow - Phone đŸ«— 8923113531 ☛ Escorts Service at 6...
Hazratganj / Call Girl in Lucknow - Phone đŸ«— 8923113531 ☛ Escorts Service at 6...
 
Young⚡Call Girls in Lajpat Nagar Delhi >àŒ’9667401043 Escort Service
Young⚡Call Girls in Lajpat Nagar Delhi >àŒ’9667401043 Escort ServiceYoung⚡Call Girls in Lajpat Nagar Delhi >àŒ’9667401043 Escort Service
Young⚡Call Girls in Lajpat Nagar Delhi >àŒ’9667401043 Escort Service
 
Call girls in Kanpur - 9761072362 with room service
Call girls in Kanpur - 9761072362 with room serviceCall girls in Kanpur - 9761072362 with room service
Call girls in Kanpur - 9761072362 with room service
 

Operational Efficiency Hacks Web20 Expo2009

  • 1. Operational EfïŹciency Hacks John Allspaw Operations Engineering, Flickr
  • 2. who am I? Manage the Flickr Operations group Wrote a geeky book:
  • 4. “EfïŹciencies” Doing more with the robots you’ve got
  • 5. “EfïŹciencies” Doing more with the robots you’ve got Doing more with the humans you’ve got
  • 6. Some optimization “rules”
  • 7. Some optimization “rules” - Don’t rely on being able to tweak anything.
  • 8. Some optimization “rules” - Don’t rely on being able to tweak anything. - Don’t waste too much time tuning when you have no evidence it’ll matter.
  • 10. Optimization “rules” “We should forget about small efïŹciencies, say about 97% of the time: premature optimization is the root of all evil.” Knuth, (or Hoare)
  • 13. Optimization “rules” That doesn’t give us an excuse to be lazy and inefïŹcient.
  • 14. Optimization “rules” That doesn’t give us an excuse to be lazy and inefïŹcient.
  • 15. Optimization “rules” That doesn’t give us an excuse to be lazy and inefïŹcient. We lean on the experience of people in the community for evidence that tuning(s) might be a worthwhile thing to do.
  • 16. Optimization “rules” “Yet we should not pass up our opportunities in that critical 3 percent.” Knuth, (or Hoare)
  • 17. So... stop somewhere in here OMG obvious I'm wasting tuning !@#$ time wins for no reason
  • 19. Our Context - 24 TB of MySQL data
  • 20. Our Context - 24 TB of MySQL data - 32k/sec of MySQL writes
  • 21. Our Context - 24 TB of MySQL data - 32k/sec of MySQL writes - 120k/sec of MySQL reads
  • 22. Our Context - 24 TB of MySQL data - 32k/sec of MySQL writes - 120k/sec of MySQL reads - 6 PB of photos
  • 23. Our Context - 24 TB of MySQL data - 32k/sec of MySQL writes - 120k/sec of MySQL reads - 6 PB of photos - 10TB storage eaten per day
  • 24. Our Context - 24 TB of MySQL data - 32k/sec of MySQL writes - 120k/sec of MySQL reads - 6 PB of photos - 10TB storage eaten per day - 15,362 service monitors (alerts)
  • 26. Infrastructure Hacks - Examples of what changing software can do (plain old-fashioned performance tuning)
  • 27. Infrastructure Hacks - Examples of what changing software can do (plain old-fashioned performance tuning) - Examples of what changing hardware can do (yay for Mr. Moore!)
  • 28. Leaning on compilers (synthetic PHP benchmarks, not real-world) (http://sebastian-bergmann.de/archives/634-PHP-GCC- ICC-Benchmark.html)
  • 29. PHP (real-world) php 4.4.8 to php 5.2.8 migration
  • 30. Can now handle more with less same taste, less ïŹlling
  • 32. Image Processing - 2004, Flickr was using ImageMagick for image processing (version 6.1.9)
  • 33. Image Processing - 2004, Flickr was using ImageMagick for image processing (version 6.1.9) - Changed to GraphicsMagick, about 15% faster at the time (version 1.1.5)
  • 34. Image Processing - 2004, Flickr was using ImageMagick for image processing (version 6.1.9) - Changed to GraphicsMagick, about 15% faster at the time (version 1.1.5) - Only need a subset of ImageMagick features anyway for our purposes
  • 35. Image Processing - OpenMP support (http://en.wikipedia.org/wiki/Openmp) - Allows parallelization of processing jobs, using multiple cores working on the same image - Some algorithms have more parallelization than others
  • 36. Image Processing - Test script - 7 large-ish DSLR photos - Cascade resizing each to 6 smaller sizes, semi-typical for Flickr’s workload - Each resize processed serially
  • 37. Image Processing compiler differences (GM version 1.1.14, non-OpenMP)
  • 38. Image Processing OpenMP differences OpenMP advantage (gcc 4.1.2, on quad core Xeon L5335 @ 2.00GHz)
  • 39. Image Processing CPU differences
  • 41. Diagonal Scaling - Vertically scaling your already horizontally- scaled nodes
  • 42. Diagonal Scaling - Vertically scaling your already horizontally- scaled nodes - a.k.a. “tech refresh”
  • 43. Diagonal Scaling - Vertically scaling your already horizontally- scaled nodes - a.k.a. “tech refresh” - a.k.a. “Moore’s Law SurïŹng”
  • 45. Diagonal Scaling 67 “old” webservers with 18 “new” : We replaced
  • 46. Diagonal Scaling 67 “old” webservers with 18 “new” : We replaced CPUs RAM drives total power (W) servers @60% peak per server per server per server 67 2 8763.6 4GB 1x80GB 18 8 2332.8 4GB 1x146GB
  • 47. Diagonal Scaling 67 “old” webservers with 18 “new” : We replaced CPUs RAM drives total power (W) servers @60% peak ~70% LESS power per server per server per server 67 2 8763.6 4GB 1x80GB 49U LESS rack space 18 8 2332.8 4GB 1x146GB
  • 49. Diagonal Scaling 23 “old” image processing boxes with 8 “new” We replaced
  • 50. Diagonal Scaling 23 “old” image processing boxes with 8 “new” We replaced server photos/min rack total power (W) @60% peak 23 1035 23 3008.4 8 1120 8 1036.8
  • 51. Diagonal Scaling 23 “old” image processing boxes with 8 “new” We replaced server photos/min rack total power (W) @60% peak ~75% FASTER 23 1035 23 3008.4 15U LESS rack space 65% LESS power 8 8 1120 1036.8
  • 52. Diagonal Scaling 23 “old” image processing boxes with 8 “new” We replaced server photos/min rack total power (W) ~75% FASTER @60% peak 15U LESS rack space 23 1035 23 3008.4 65% LESS power 8 8 1120 1036.8
  • 53. Diagonal Scaling 23 “old” image processing boxes with 8 “new” We replaced server photos/min rack total power (W) ~75% FASTER @60% peak 15U LESS rack space 23 1035 23 3008.4 65% LESS power 8 8 1120 1036.8
  • 54. Diagonal Scaling 23 “old” image processing boxes with 8 “new” We replaced server photos/min rack total power (W) ~75% FASTER @60% peak 15U LESS rack space 23 1035 23 3008.4 65% LESS power 8 8 1120 1036.8 from this to this
  • 55. What do you do with old/slow machines?
  • 56. What do you do with old/slow machines? - Liquidate
  • 57. What do you do with old/slow machines? - Liquidate - Re-purpose as dev/staging/etc
  • 58. What do you do with old/slow machines? - Liquidate - Re-purpose as dev/staging/etc - “ofïŹ‚ine” tasks
  • 60. OfïŹ‚ine Tasks - Out-of-band/asynchronous queuing and execution system, for non-realtime tasks
  • 61. OfïŹ‚ine Tasks - Out-of-band/asynchronous queuing and execution system, for non-realtime tasks - See here:
  • 62. OfïŹ‚ine Tasks - Out-of-band/asynchronous queuing and execution system, for non-realtime tasks - See here: http://code.ïŹ‚ickr.com/blog/2008/09/26/ïŹ‚ickr-engineers-do-it-ofïŹ‚ine/
  • 63. OfïŹ‚ine Tasks - Out-of-band/asynchronous queuing and execution system, for non-realtime tasks - See here: http://code.ïŹ‚ickr.com/blog/2008/09/26/ïŹ‚ickr-engineers-do-it-ofïŹ‚ine/ - See Myles Grant talk about it more here:
  • 64. OfïŹ‚ine Tasks - Out-of-band/asynchronous queuing and execution system, for non-realtime tasks - See here: http://code.ïŹ‚ickr.com/blog/2008/09/26/ïŹ‚ickr-engineers-do-it-ofïŹ‚ine/ - See Myles Grant talk about it more here: http://en.oreilly.com/velocity2009/public/schedule/detail/7552
  • 65. Runbook Hacks “WTF HAPPENED LAST NIGHT?!”
  • 66. Why?
  • 67. Why? As infrastructure grows, try to keep the Humans:Machines ratio from getting out of hand
  • 68. Why? As infrastructure grows, try to keep the Humans:Machines ratio from getting out of hand Some of the How:
  • 69. Why? As infrastructure grows, try to keep the Humans:Machines ratio from getting out of hand Some of the How: - teach machines to build themselves
  • 70. Why? As infrastructure grows, try to keep the Humans:Machines ratio from getting out of hand Some of the How: - teach machines to build themselves - teach machines to watch themselves
  • 71. Why? As infrastructure grows, try to keep the Humans:Machines ratio from getting out of hand Some of the How: - teach machines to build themselves - teach machines to watch themselves - teach machines to ïŹx themselves
  • 72. Why? As infrastructure grows, try to keep the Humans:Machines ratio from getting out of hand Some of the How: - teach machines to build themselves - teach machines to watch themselves - teach machines to ïŹx themselves - reduce MTTR by streamlining
  • 74. Automated Infrastructure - If there is only one thing you do, automatic conïŹguration and deployment management should be it.
  • 75. Automated Infrastructure - If there is only one thing you do, automatic conïŹguration and deployment management should be it. - See: - Opscode/Chef (http://opscode.com/) - Puppet (http://reductivelabs.com/products/puppet/) - System Imager/ConïŹgurator (http://wiki.systemimager.org)
  • 77. Time Machine time is cheaper than human time. If a failure results in some commands being run to â€˜ïŹx’ it, make the machines do it. (i.e., don’t wake people up for stupid things!)
  • 79. Aggregate Monitoring Don’t care about single nodes, only care about delta change of metrics/faults - Warn (email) on X % change - Page (wake up) on Y % change
  • 80. Aggregate Monitoring Don’t care about single nodes, only care about delta change of metrics/faults - Warn (email) on X % change - Page (wake up) on Y % change High and low water marks for some metrics
  • 82. Self-Healing Make service monitoring ïŹx common failure scenarios, notify us later about it.
  • 83. Self-Healing Make service monitoring ïŹx common failure scenarios, notify us later about it.
  • 84. Self-Healing Make service monitoring ïŹx common failure scenarios, notify us later about it. Daemons/processes run on machines, will take corrective action under certain conditions, and report back with what they did.
  • 85. Self-Healing Make service monitoring ïŹx common failure scenarios, notify us later about it. Daemons/processes run on machines, will take corrective action under certain conditions, and report back with what they did.
  • 86. Self-Healing Make service monitoring ïŹx common failure scenarios, notify us later about it. Daemons/processes run on machines, will take corrective action under certain conditions, and report back with what they did. Can greatly reduce your mean time to recovery (MTTR)
  • 87. Self-Healing Make service monitoring ïŹx common failure scenarios, notify us later about it. Daemons/processes run on machines, will take corrective action under certain conditions, and report back with what they did. Can greatly reduce your mean time to recovery (MTTR)
  • 89. Basic Apache Example 1. Webserver not running?
  • 90. Basic Apache Example 1. Webserver not running? 2. Under certain conditions, try to start it, and email that this happened. (I’ll read it tomorrow)
  • 91. Basic Apache Example 1. Webserver not running? 2. Under certain conditions, try to start it, and email that this happened. (I’ll read it tomorrow) 3. Won’t start? Assume something’s really wrong, so don’t keep trying (email that, too)
  • 93. MySQL Self-Healing Some MySQL Issues â€œïŹxed” by the machines
  • 94. MySQL Self-Healing Some MySQL Issues â€œïŹxed” by the machines
  • 95. MySQL Self-Healing Some MySQL Issues â€œïŹxed” by the machines - Kill long-running SELECT queries (marked safe to kill)
  • 96. MySQL Self-Healing Some MySQL Issues â€œïŹxed” by the machines - Kill long-running SELECT queries (marked safe to kill) - Queries not safe to kill are marked by the application as “NO KILL” in comments
  • 97. MySQL Self-Healing Some MySQL Issues â€œïŹxed” by the machines - Kill long-running SELECT queries (marked safe to kill) - Queries not safe to kill are marked by the application as “NO KILL” in comments - Run EXPLAIN on killed queries, and report the results
  • 98. MySQL Self-Healing Some MySQL Issues â€œïŹxed” by the machines - Kill long-running SELECT queries (marked safe to kill) - Queries not safe to kill are marked by the application as “NO KILL” in comments - Run EXPLAIN on killed queries, and report the results - Keep track of the query types and databases that need the most killing, produce a “DBs that Suck” report
  • 100. MySQL Self-Healing Some MySQL Replication issues â€œïŹxed” by the machines, by error
  • 101. MySQL Self-Healing Some MySQL Replication issues â€œïŹxed” by the machines, by error - Skip errors that can safely be skipped and restart slave threads
  • 102. MySQL Self-Healing Some MySQL Replication issues â€œïŹxed” by the machines, by error - Skip errors that can safely be skipped and restart slave threads - Force refetch of replication binlogs on: - 1064 (ER_PARSE_ERROR)
  • 103. MySQL Self-Healing Some MySQL Replication issues â€œïŹxed” by the machines, by error - Skip errors that can safely be skipped and restart slave threads - Force refetch of replication binlogs on: - 1064 (ER_PARSE_ERROR) - Re-run queries on: - 1205 (ER_LOCK_WAIT_TIMEOUT) - 1213 (ER_LOCK_DEADLOCK)
  • 105. Code and ConïŹg Deploy Logs
  • 106. Code and ConïŹg Deploy Logs 1. ESSENTIAL
  • 107. Code and ConïŹg Deploy Logs 1. ESSENTIAL 2. MANDATORY
  • 109. Communications ‱ Internal IRC - For ongoing discussions - Logged, so “inïŹnite” scrollback
  • 110. Communications ‱ Internal IRC - For ongoing discussions - Logged, so “inïŹnite” scrollback ‱ IM Bot (built on libyahoo2.sf.net) - For production changes - Broadcasts all to all contacts - Logged, and injected into IRC - IM Status = who is in primary/secondary on-call
  • 111. Communications ‱ Internal IRC - For ongoing discussions - Logged, so “inïŹnite” scrollback ‱ IM Bot (built on libyahoo2.sf.net) - For production changes - Broadcasts all to all contacts - Logged, and injected into IRC - IM Status = who is in primary/secondary on-call ‱ All of IRC and IM Bot slurped into a search index
  • 112.
  • 113. when
  • 114. when what
  • 115. when what detailed what*
  • 116. when what detailed what* *also points to what commands should be used to back out the changes
  • 117. when what detailed what* who *also points to what commands should be used to back out the changes
  • 118. when what detailed what* who *also points to what commands should be used to back out the changes
  • 119. when what detailed what* who time of last deploy at top of ganglia *also points to what commands should be used to back out the changes
  • 120.
  • 121.
  • 122. IM Bot (timestamps help correlation)
  • 123. IM Bot (timestamps help correlation)
  • 124. IM Bot (timestamps help correlation) all IRC, IM bot into searchable history
  • 125. Morals of Our Stories
  • 126. Morals of Our Stories - Optimizations can be a Very Good Thingℱ
  • 127. Morals of Our Stories - Optimizations can be a Very Good Thingℱ - Weigh time spent optimizing against expected gains
  • 128. Morals of Our Stories - Optimizations can be a Very Good Thingℱ - Weigh time spent optimizing against expected gains - Lean on others for how much “expected gains” mean for different scenarios
  • 129. Morals of Our Stories - Optimizations can be a Very Good Thingℱ - Weigh time spent optimizing against expected gains - Lean on others for how much “expected gains” mean for different scenarios - Plain old-fashioned intuition
  • 130. Some Wisdom Nuggets Jon Prall’s 85 WebOps Rules: http://jprall.vox.com/library/post/85- operations-rules-to-live-by.html

Hinweis der Redaktion

  1. Finding huge gains from tweaking gets harder as you turn the knobs and pull the levers.
  2. He had some evidence that suggested compilers and versions would make a difference for PHP.
  3. So, we did it. (not just for performance reasons, but still...) Will this happen to you, too? Maybe. Maybe not.
  4. Performance gains like this don’t come very often, for “free”.
  5. We don’t need 80% of what ImageMagick has.
  6. We don’t need 80% of what ImageMagick has.
  7. We don’t need 80% of what ImageMagick has.
  8. Cascading resizing: Original -> Large Large -> Medium Large -> Small Medium -> Thumb Medium -> Square
  9. This was done before OpenMP support was in. Compilers and optimization flags can make a difference!
  10. Enter OpenMP. Example of how
  11. These have been the revisions of our image processing hardware over time. 4x faster
  12. Examples are contact notifications, large photoset deletions, etc.
  13. Examples are contact notifications, large photoset deletions, etc.
  14. Examples are contact notifications, large photoset deletions, etc.
  15. Examples are contact notifications, large photoset deletions, etc.
  16. Examples are contact notifications, large photoset deletions, etc.
  17. Examples are contact notifications, large photoset deletions, etc.
  18. Runbook hacks: tuning the process of operations failure handling and mitigation.
  19. Low water marks as indirect trouble indicators.
  20. Low water marks as indirect trouble indicators.
  21. Can be as simple as cron jobs, or nagios plugins. NRPE or NSCA.
  22. Can be as simple as cron jobs, or nagios plugins. NRPE or NSCA.
  23. Can be as simple as cron jobs, or nagios plugins. NRPE or NSCA.
  24. Can be as simple as cron jobs, or nagios plugins. NRPE or NSCA.
  25. Can be as simple as cron jobs, or nagios plugins. NRPE or NSCA.
  26. Can be as simple as cron jobs, or nagios plugins. NRPE or NSCA.
  27. Skippable errors: “can’t drop”, “already exists”, “duplicate key name”
  28. Skippable errors: “can’t drop”, “already exists”, “duplicate key name”
  29. Skippable errors: “can’t drop”, “already exists”, “duplicate key name”
  30. Skippable errors: “can’t drop”, “already exists”, “duplicate key name”
  31. Simple tips and tricks that can help in fixing things when they break.
  32. This shouldn’t be considered optional.
  33. This shouldn’t be considered optional.
  34. Our IRC and IM logs get injected into a search engine, almost exactly like Lucene.
  35. Our IRC and IM logs get injected into a search engine, almost exactly like Lucene.
  36. Our IRC and IM logs get injected into a search engine, almost exactly like Lucene.
  37. Our IRC and IM logs get injected into a search engine, almost exactly like Lucene.