Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Have You Tried Turning It Off And Turning It On Again

560 Aufrufe

Veröffentlicht am

Most of us have a backup strategy, many of us have a restore strategy and several of us have a fully tested restore strategy. But backups are not the whole story. I'll talk about the parts of disaster recovery we're less prepared for, and dependencies that you might not think about until one day when you really do turn an entire service, entire site or (perish the thought!) an entire company off and on again.

This talk will cover managing complexity, testing your fallback plan and avoiding dependency cycles that make it impossible to restart groups of systems. Like, where do you store the documentation on how to recover the documentation server?

Talk from
- LISA 2017, November 1st 2017
- Velocity New York, October 3rd, 2017
- SRECon Europe, September 1st, 2017 (Plenary)

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Have You Tried Turning It Off And Turning It On Again

  1. 1. Have you tried turning it off and turning it on again? Tanya Reilly (@whereistanya)
  2. 2. Where are you in the stack? CMS MY SKILLZ RESUME
  3. 3. Where are you in the stack? MY SKILLZ RESUME CMS
  4. 4. Where are you in the stack? MY SKILLZ RESUME CMS CMS ???
  5. 5. Where are you in the stack? CMS!! CMS ??? MY SKILLZ RESUME
  6. 6. What is it?
  7. 7. Abstractions let us specialise
  8. 8. A slidedoc is a visual document, developed in presentation software, that is intended to be read and referenced instead of projected. 8
  9. 9. ● fallback plans ● dependencies ● what can we do?
  10. 10. Fallback plans
  11. 11. 11 backups Of course! I copied it to /tmp/ database.bak Backup_Backup_Backup_-_And_Test_Restores, T-Town Photo Booth CC BY 2.0Backup, Backup Backup and Test Restores, T-Town Photo Booth CC BY 2.0
  12. 12. If you haven't tested your backups, you don't have backups.
  13. 13. the identical(ish!) failover site Take two, they’re small
  14. 14. 1. Is it real?
  15. 15. 2. Is it up to date? Retro, Melinda Seckington CC BY 2.0
  16. 16. 3. Is the idea of actually using it kind of terrifying? FreeImages.com/barun patro
  17. 17. Two sites? I’ve forgotten how to count that low replicated everything
  18. 18. ● fallback plans ● dependencies ● what can we do?
  19. 19. (Micro)services
  20. 20. (Micro)services
  21. 21. Jenga Tower of 50 blocks, Johannes Bader CC BY 2.0
  22. 22. Everybody's backend is someone else's frontend
  23. 23. “ "A service cannot be more available than the intersection of all its critical dependencies." -- "The Calculus of Service Availability" Ben Treynor, Mike Dahlin, Vivek Rau, Betsy Beyer
  24. 24. Your stack… is really more of a "pile", isn't it? dependency cycles
  25. 25. uh... control plane cycles ŠJů CC BY 4.0
  26. 26. 26 Why would we ever restart it? machines that run forever (until they don't)
  27. 27. 27 root@X1:/# halt Connection to x1 closed by remote host. root@x3:/# halt Connection to x3 closed by remote host. $ root@x2:/# halt Connection to x2 closed by remote host. root@x10000:/# halt Connection to x10000 closed by remote host $ Global simultaneous reboot (doesn’t usually look like this) root@x10001:/# halt Connection to x10001 closed by remote host. $
  28. 28. tick tick tick...
  29. 29. tick tick tick...
  30. 30. tick tick tick...
  31. 31. tick tick tick...
  32. 32. tick tick tick...
  33. 33. tick tick tick...
  34. 34. tick tick tick...
  35. 35. tick tick tick...
  36. 36. tick tick tick...
  37. 37. tick tick tick...
  38. 38. ● fallback plans ● dependencies ● what can we do?
  39. 39. dependency discovery We're built on top of... really? Are you sure? It’s Turtles All The Way Down, William Warby. CC BY 2.0
  40. 40. Original design. "We keep an in-memory index of the available cats with some metadata about them. The cat chooser uses the information from the user's request to determine the attributes that needs to be fulfilled. It checks the index for cats that match those attributes. It pulls those cats from storage and returns one to the user. Indexers run at startup and at intervals to update the metadata and make sure we have accurate cat sentiment analysis and geolocation data." design docs obscure information
  41. 41. Original design. "We keep an in-memory index of the available cats with some metadata about them. The cat chooser uses the information from the user's request to determine the attributes that needs to be fulfilled. It checks the index for cats that match those attributes. It pulls those cats from storage and returns one to the user. Indexers run at startup and at intervals to update the metadata and make sure we have accurate cat sentiment analysis and geolocation data." User story: cat must be a cat. "Since catness analysis is an expensive operation, we only run the catness analyzer on pictures that have a high likelihood of being served, i.e., those aleady indexed with geolocation and sentiment data. We'll take the index from the cat chooser and run catness analysis over the cats from that queue." design docs obscure information
  42. 42. webserver -> cat_chooser cat_chooser -> cat_geotagger cat_chooser -> cat_sentiment_analyzer cat_chooser -> catness_analyzer cat_chooser -> storage catness_analyzer -> cat_chooser catness_analyzer -> storage cat_geotagger -> storage cat_sentiment_analyzer -> storage lists are better
  43. 43. digraph { webserver -> cat_chooser cat_chooser -> cat_geotagger cat_chooser -> cat_sentiment_analyzer cat_chooser -> catness_analyzer cat_chooser -> storage catness_analyzer -> cat_chooser catness_analyzer -> storage cat_geotagger -> storage cat_sentiment_analyzer -> storage } pictures are even better
  44. 44. digraph { webserver -> cat_chooser cat_chooser -> cat_geotagger cat_chooser -> cat_sentiment_analyzer cat_chooser -> catness_analyzer cat_chooser -> storage catness_analyzer -> cat_chooser catness_analyzer -> storage cat_geotagger -> storage cat_sentiment_analyzer -> storage } pictures are even better $ /usr/bin/dot -T png -o cats.png cats.dot
  45. 45. But don't put humans in charge of remembering what connects to what
  46. 46. dependency policies This is Site Reliability. THERE ARE RULES.
  47. 47. Are you on the guest list? policy enforcement
  48. 48. policy enforcement Are you on the guest list? kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: access-nginx spec: podSelector: matchLabels: run: nginx ingress: - from: - podSelector: matchLabels: access: "true"
  49. 49. Manage your dependencies early (It’s harder later on) Some sock yarn, Kara, CC BY-ND 2.0
  50. 50. architecting in layers Look, an actual stack!
  51. 51. soft dependencies
  52. 52. The appearance of U.S. Department of Defense (DoD) visual information does not imply or constitute DoD endorsement. The reserve parachute is always packed by an expert reliable fallback plans
  53. 53. 53 testing Really, try turning it off and turning it on again
  54. 54. in conclusion...
  55. 55. ❏ small fallback plans ❏ test everything ❏ manage dependencies ❏ architect in layers fallback plans: small, simple, solid
  56. 56. if it's not tested, assume it doesn't work ❏ small fallback plans ❏ test everything ❏ manage dependencies ❏ architect in layers if it's not tested, assume it doesn't work
  57. 57. manage your dependencies before you need to ❏ small fallback plans ❏ test everything ❏ manage dependencies ❏ architect in layers
  58. 58. ❏ small fallback plans ❏ test everything ❏ manage dependencies ❏ architect in layers dependencies go down the stack
  59. 59. hope is not a strategy @whereistanya https://github.com/whereistanya/graphviz/ Presentation template by slidescarnival.com

×