Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

SPOF - Single "Person" of Failure

4.629 Aufrufe

Veröffentlicht am

The talk from DevOps Days Silicon Valley 2015 conference which describes the signs of having or being a single point of failure expert on your system, and the ways to solve the problem

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

SPOF - Single "Person" of Failure

  1. 1. Single Point of Failure… Expert Sasha Rosenbaum, @DivineOps
  2. 2. Who am I? Sasha Rosenbaum Azure & DevOps consultant at 10th Magnitude for 4 years Co-organizer of - DevOps Days Chicago Conference - Chicago Azure meetup @DivineOps
  3. 3. What is a Single Point of Failure? @DivineOps
  4. 4. A single point of failure (SPOF) is a part of a system that, if it fails, will stop the entire system from working @DivineOps
  5. 5. High Availability  Achieving redundancy by removing single points of failure  Having reliable cross-over capabilities to switch between components  Detection of failures as they occur, so that cross-over can be initiated @DivineOps
  6. 6. This is complicated @DivineOps
  7. 7. Architecting for HA @DivineOps
  8. 8. How is the entire system down? @DivineOps
  9. 9. We forgot a dependency! @DivineOps
  10. 10. Oh… @DivineOps
  11. 11. Just imagine buying a server that Uptime of roughly 16 hours a day With interruptions Single one of its kind Cannot be replicated! @DivineOps
  12. 12. Humans are NOT highly available @DivineOps
  13. 13. How did we get here? Lack of budget Lack of people Human nature @DivineOps
  14. 14. How to recognize that you have a problem? @DivineOps
  15. 15. 1 @DivineOps
  16. 16. Keys to the Kingdom @DivineOps
  17. 17. TO MY PRODUCTION SERVER @DivineOps
  18. 18. Even when the systems are automated there are still humans who manage them @DivineOps
  19. 19. Why is there a single admin? The situation evolved organically from having a small team Someone took over deliberately @DivineOps
  20. 20. Role Based Access Grant access based on a role/group Admin group size > 1 Service accounts @DivineOps
  21. 21. Make sure that the person on call has the necessary access to fix the problem @DivineOps
  22. 22. TRUST YOUR PEOPLE!!! @DivineOps
  23. 23. 2 @DivineOps
  24. 24. Beware of the Expert! @DivineOps
  25. 25. “This will take 15 minutes to fix And 8 hours to explain” @DivineOps
  26. 26. We cannot afford the loss of productivity! @DivineOps
  27. 27. Can you afford losing this knowledge? @DivineOps
  28. 28. Delegate to Juniors @DivineOps
  29. 29. Juniors are wonderful people They ask tough questions @DivineOps
  30. 30. Your new hires haven’t yet caught the “This is how it’s always been” virus @DivineOps
  31. 31. You are emotionally invested in your code It is hard not to get protective of it @DivineOps
  32. 32. Documentation Documents Readme Comments Tests Automation Features @DivineOps
  33. 33. 3 @DivineOps
  34. 34. “I cannot afford to take vacation!” @DivineOps
  35. 35. Job security? @DivineOps
  36. 36. Productivity? @DivineOps
  37. 37. Hours / Productivity @DivineOps
  38. 38. Research shows that working longer hours DOES NOT increase productivity @DivineOps
  39. 39. You need rest to be at your best! @DivineOps
  40. 40. Cell phones are the single worse thing that happened to people AND businesses in the last century @DivineOps
  41. 41. If people were actually unreachable we would find a more reliable way to solve problems @DivineOps
  42. 42. Mandatory Vacation @DivineOps
  43. 43. Game Days @DivineOps
  44. 44. Say NO to having a Single PERSON of Failure ;-) @DivineOps
  45. 45. Great job, DoD Silicon Valley! @DivineOps