Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Never gonna give you up

711 Aufrufe

Veröffentlicht am

7 lessons learned building high availability, high performance systems

Veröffentlicht in: Internet
  • Als Erste(r) kommentieren

Never gonna give you up

  1. 1. @EdMcBane 7 lessons learned building HP/HA systems Never gonna give you up Never gonna let you down
  2. 2. @EdMcBane Francesco Degrassi Enthusiastic yet pragmatic Lean Software Developer. Uppish and cynical nihilist from time to time.
  3. 3. @EdMcBane Lean Software Development and team coaching Continuous Delivery, High availability, performance Security sensitive & high uncertainty domains
  4. 4. @EdMcBane The challenge ● Primary european client ● Innovative service for the consumer market ● Large userbase (200K+ users) ● Very high request rate ● Low latency requirement (<< RTT)
  5. 5. @EdMcBane What we built
  6. 6. @EdMcBane What did we learn?
  7. 7. @EdMcBane Make your assumptions explicit and keep testing them Don’t eat the yellow snow
  8. 8. @EdMcBane Make your assumptions explicit and keep testing them #1 Make your assumptions explicit and keep challenging them
  9. 9. @EdMcBane Make your assumptions explicit and keep testing them #2 Performance & High Availability are not extra features
  10. 10. @EdMcBane
  11. 11. @EdMcBane Make your assumptions explicit and keep testing them #3 Do not reinvent the wheel ...but keep things simple
  12. 12. @EdMcBane
  13. 13. @EdMcBane ● Everything was good with the single core scenario In our case...
  14. 14. @EdMcBane SO_REUSEPORT For TCP, so_reuseport allows multiple listener sockets to be bound to the same port. Received packets are distributed to multiple sockets bound to the same port using a 4-tuple hash. With so_reuseport the distribution is uniform.
  15. 15. @EdMcBane Everything should be made as simple as possible, but not simpler — Albert Einstein
  16. 16. @EdMcBane LESS(1) General Commands Manual LESS(1) NAME less - opposite of more SYNOPSIS less -? less --help less -V less --version less [-[+]aABcCdeEfFgGiIJKLmMnNqQrRsSuUVwWX~] [-b space] [-h lines] [-j line] [-k keyfile] [-{oO} logfile] [-p pattern] [-P prompt] [-t tag] [-T tagsfile] [-x tab,...] [-y lines] [-[z] lines] [-# shift] [+[+]cmd] [--] [filename]... (See the OPTIONS section for alternate option syntax with long option names.) DESCRIPTION LESS IS similar to MORE (1), but has many more features. Less does not have to read the entire input file before starting, so with large input files it starts up faster than text editors like vi (1). Less uses termcap (or terminfo on some systems), so it can run on Manual page less(1) line 1 (press h for help or q to quit) .
  17. 17. @EdMcBane Make your assumptions explicit and keep testing them #4 Be wary of cargo-cult optimization
  18. 18. @EdMcBane
  19. 19. @EdMcBane TCP_TW_RECYCLE Enable fast recycling TIME-WAIT sockets. Default value is 0. It should not be changed without advice/request of technical experts. Linux will drop any segment from the remote host whose timestamp is not strictly bigger than the latest recorded timestamp TCP_TW_RECYCLE + NAT = MADNESS
  20. 20. @EdMcBane
  21. 21. @EdMcBane Make your assumptions explicit and keep testing them #5 High Availability is much more than just redundancy
  22. 22. @EdMcBane
  23. 23. @EdMcBane ● Redundant hardware ● Redundant software components But there’s more! ● Graceful degradation ● Incremental rollouts Failure impact
  24. 24. @EdMcBane Failure frequency But then also: ● proven technology ● high quality hardware ● automation (to avoid errors)
  25. 25. @EdMcBane ● Effective monitoring ○ realtime ○ reliable ○ understandable ○ thorough ○ meaningful ○ actionable ● Rollback / rollforward ● Automation (for speed) Time to recover
  26. 26. @EdMcBane Our response plan goes something like this... AaaaaAAaaaah
  27. 27. @EdMcBane ...but be prepared to improvise ● In house experience ● Developers on call ● Drills (chaos monkeys) Processes designed for ordinary times are not resilient in a crisis and need to be changed.
  28. 28. @EdMcBane Make your assumptions explicit and keep testing them #6 Embrace diversity
  29. 29. @EdMcBane
  30. 30. @EdMcBane
  31. 31. @EdMcBane Make your assumptions explicit and keep testing them #7 Monitoring is essential … and we can do way better
  32. 32. @EdMcBane No one size fits all ● “Monitor everything”, like “100% test coverage” is a nice slogan. ● Each environment requires a slightly different solution ● Balance between data availability, cost and ability to keep it actionable
  33. 33. @EdMcBane
  34. 34. @EdMcBane We are doing logging wrong ● Unstructured ● Inconsistent ● Poor defaults ● Complex, obscure components ● A huge waste of computing power
  35. 35. @EdMcBane We need a complete overview ● Logs ● Metrics ● Alerts ● Together, coherent, cross-referenced
  36. 36. @EdMcBane Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. Douglas Adams “ ”
  37. 37. @EdMcBane Thanks! @EdMcBane fdegrassi@gmail.com francesco.degrassi@optionfactory.net http://www.optionfactory.net/blog