Agility Requires Safety

8.575 Aufrufe

Veröffentlicht am

To go faster in a car, you need not only a powerful engine, but also safety mechanisms like brakes, air bags, and seat belts. This is a talk about the safety mechanisms that allow you to build software faster. It's based on the book "Hello, Startup" (http://www.hello-startup.net/). You can find the video of the talk here: https://www.youtube.com/watch?v=4fKm6ImKml8

Veröffentlicht in: Software
1 Kommentar
40 Gefällt mir
Statistik
Notizen
Keine Downloads
Aufrufe
Aufrufe insgesamt
8.575
Auf SlideShare
0
Aus Einbettungen
0
Anzahl an Einbettungen
1.292
Aktionen
Geteilt
0
Downloads
91
Kommentare
1
Gefällt mir
40
Einbettungen 0
Keine Einbettungen

Keine Notizen für die Folie

Agility Requires Safety

  1. 1. AGILITY requires SAFETY
  2. 2. Every startup has the same story:
  3. 3. “We don’t have time for best practices.”
  4. 4. You can’t go faster by being reckless
  5. 5. Think of cars on a highway
  6. 6. What happens if everyone jams down on the gas?
  7. 7. To go fast, a car needs not only a powerful engine…
  8. 8. But also powerful brakes.
  9. 9. As well as seat belts, airbags, bumpers, and auto-pilot
  10. 10. For cars and for software, speed is limited by safety
  11. 11. What are the seat belts, brakes, & self-driving cars of software?
  12. 12. This talk is about safety mechanisms
  13. 13. That make it possible to build software quickly
  14. 14. I’m Yevgeniy Brikman ybrikman.com
  15. 15. Founder of Atomic Squirrel atomic-squirrel.net
  16. 16. PAST LIVES
  17. 17. Author of Hello, Startup hello-startup.net
  18. 18. 1. Brakes 2. Bulkheads 3. Autopilot 4. Safety catch 5. Speedometer 6. Warning lights 7. Seat belt Outline
  19. 19. 1. Brakes 2. Bulkheads 3. Autopilot 4. Safety catch 5. Speedometer 6. Warning lights 7. Seat belt Outline
  20. 20. Good brakes stop your car before you run into something
  21. 21. Continuous integration stops buggy code before it goes into production
  22. 22. Imagine your goal is to build the International Space Station
  23. 23. Each team designs and builds their component in isolation
  24. 24. You launch everything into space and hope it all comes together
  25. 25. I thought the Russians were going to build the bathrooms?
  26. 26. Weren’t the French supposed to do the wiring?
  27. 27. Everyone is using the metric system, right?
  28. 28. Teams working for a long time with incorrect assumptions
  29. 29. Finding this out when you’re in outer space is too late
  30. 30. This is the result of “late integration”
  31. 31. Lots of teams working in isolation on separate branches
  32. 32. Before attempting a massive merge at the very end
  33. 33. MERGE CONFLICT
  34. 34. The alternative is “continuous integration”
  35. 35. Where everyone regularly merges their work
  36. 36. The most common approach is trunk-based development
  37. 37. Everyone works on a single branch (trunk)
  38. 38. That can’t possibly scale to a lot of developers, can it?
  39. 39. Uses trunk-based development for 1,000+ developers
  40. 40. Uses trunk-based development for 4,000+ developers
  41. 41. Uses trunk-based development for 20,000+ developers
  42. 42. Wouldn’t you have merge conflicts all the time?
  43. 43. If you merge (commit) regularly, conflicts are rare.
  44. 44. And those that happen are from a day of work—not months.
  45. 45. Commit early and often.
  46. 46. Small commits are easier to merge, test, revert, review
  47. 47. Wouldn’t there constantly be broken code in trunk?
  48. 48. Build Build Build Build Not if you run a self-testing build after every commit Build Build Build Build Build Build Build
  49. 49. Build Build Build Build It should compile your code and run your automated tests Build Build Build Build Build Build Build
  50. 50. Build Build Build Build If a build fails, a developer must fix it ASAP or revert the commit Build Build Build Build Build Build Build
  51. 51. Of course, this depends on having good automated tests
  52. 52. Tests give you the confidence to make changes quickly
  53. 53. JUnit version 4.11 ... Time: 6.063 OK (259 tests) How long would it take you to do 259 tests manually?
  54. 54. What should you test?
  55. 55. Everything!
  56. 56. Everything!
  57. 57. It’s a trade-off between: 1. Likelihood of bugs 2. Cost of bugs 3. Cost of testing
  58. 58. Likelihood of bugs is higher for complex code and large teams
  59. 59. Cost of bugs is higher for some systems (payments, security)
  60. 60. Cost of tests is higher for integration and UI tests
  61. 61. “Without continuous integration, your software is broken until somebody proves it works, usually during a testing or integration stage.
  62. 62. With continuous integration, your software is proven to work (assuming a sufficiently comprehensive set of automated tests) with every new change—and you know the moment it breaks and can fix it immediately.”
  63. 63. 1. Brakes 2. Bulkheads 3. Autopilot 4. Safety catch 5. Speedometer 6. Warning lights 7. Seat belt Outline
  64. 64. Ships have bulkheads to try to contain flooding to one area.
  65. 65. You can split up a codebase to contain problems to one area.
  66. 66. Code is the enemy: the more you have, the slower you go
  67. 67. Project Size Lines of code Bug Density Bugs per thousand lines of code < 2K 0 – 25 2K – 6K 0 – 40 16K – 64K 0.5 – 50 64K – 512K 2 – 70 > 512K 4 – 100
  68. 68. As the code grows, the number of bugs grows even faster
  69. 69. “Software development doesn't happen in a chart, an IDE, or a design tool; it happens in your head.”
  70. 70. The mind can only handle so much complexity at once
  71. 71. One solution is to break the code into multiple codebases
  72. 72. Instead of depending on the source of another module /moduleA /moduleB /moduleC /moduleD /moduleE
  73. 73. You depend on a versioned artifact from that module moduleA-0.3.1.jar moduleB-3.1.0.jar moduleC-9.8.0.jar moduleD-1.4.3.jar moduleE-0.5.6.jar
  74. 74. This provides isolation from changes in other modules moduleA-0.3.1.jar moduleB-3.1.0.jar moduleC-9.8.0.jar moduleD-1.4.3.jar moduleE-0.5.6.jar
  75. 75. You already do this: guava- 18.0.jar jquery-2.2.0.js
  76. 76. Advantages of artifacts: 1. Isolation 2. Decoupling 3. Faster builds
  77. 77. Disadvantages of artifacts: 1. Dependency hell 2. No continuous integration 3. Hard to make global changes
  78. 78. Another option is to break the codebase into services
  79. 79. In a monolith, you use function calls within one process A.a() B.b() C.c() D.d() E.e()
  80. 80. With services, you pass messages between processes http://A/a http://B/b http://C/c http://D/d http://E/e
  81. 81. Advantages of services: 1. Technology agnostic 2. Scalability 3. Isolation
  82. 82. Disadvantages of services: 1. Operational overhead 2. Performance overhead 3. I/O, error handling 4. Backwards compatibility 5. Hard to make global changes
  83. 83. 1. Brakes 2. Bulkheads 3. Autopilot 4. Safety catch 5. Speedometer 6. Warning lights 7. Seat belt Outline
  84. 84. Autopilot prevents accidents caused by human error
  85. 85. Automated deployments prevent accidents caused by human error
  86. 86. Deploying code can be painful
  87. 87. “If it hurts, do it more often.” – Martin Fowler
  88. 88. The deployment process should be:
  89. 89. That means you should never deploy or configure manually
  90. 90. > ssh ec2-user@12.34.56.78 __| __| __| _| ( __ Amazon ECS-Optimized Amazon Linux AMI 2015.09.d ____|___|____/ [ec2-user ~]$ sudo apt-get install ruby Don’t do this
  91. 91. Or this
  92. 92. Instead, automate everything
  93. 93. The gold standard is the blue-green deployment
  94. 94. Let’s say you have version 0.0.1 of your app deployed
  95. 95. First, deploy version 0.0.2 on a duplicate set of servers
  96. 96. If everything looks good, switch the load balancer over to 0.0.2
  97. 97. Four main categories of deployment automation tools:
  98. 98. 1. Configuration management: Chef, Puppet, Ansible, Salt
  99. 99. - name: Install httpd and php yum: name={{ item }} state=present with_items: - httpd - php - name: start httpd service: name=httpd state=started enabled=yes - name: Copy the code from repository git: repo={{ repository }} dest=/var/www/html/ Imperative scripts to configure servers and deploy code
  100. 100. 2. Provisioning tools: Terraform, CloudFormation, Heat
  101. 101. resource "aws_instance" "example" { ami = "ami-b960b1d" instance_type = ["t2.micro"] } resource "aws_eip" "ip“ { instance = "${aws_instance.example.id}" depends_on = ["aws_instance.example"] } Declarative templates that define your infrastructure
  102. 102. 3. Virtual machines: VMWare, VirtualBox, Packer, Vagrant
  103. 103. { "builders": [{ "type": "amazon-ebs", "source_ami": "ami-de0d9eb7", "instance_type": "m1.medium", "ami_name": "example-packer-ami-{{timestamp}}" }], "provisioners": [{ "type": "shell", "inline": [ "sudo apt-get -y update", "sudo apt-get -y install httpd php” ] }] } Images of configured servers
  104. 104. 4. Containers: Docker, rkt, LXD
  105. 105. FROM ubuntu:12.04 RUN apt-get update && apt-get install -y apache2 php ENV APACHE_RUN_USER www-data ENV APACHE_LOG_DIR /var/log/apache2 EXPOSE 80 CMD ["/usr/sbin/apache2", "-D", "FOREGROUND"] Lightweight images of configured servers
  106. 106. These tools allow you to define your infrastructure as code
  107. 107. That way, you can version it, review it, test it, and reuse it.
  108. 108. 1. Brakes 2. Bulkheads 3. Autopilot 4. Safety catch 5. Speedometer 6. Warning lights 7. Seat belt Outline
  109. 109. Elisha Otis demoing elevator free-fall safety in 1854
  110. 110. The safety elevator patent
  111. 111. The safety catches are locked by default
  112. 112. Only an intact cable can unlock the latches
  113. 113. This elevator provides safety by default
  114. 114. Feature toggles provide safety by default
  115. 115. New feature, part 1 New feature, part 2 New feature, part 3 If a large new feature takes many commits, wouldn’t a user see it in an unfinished state?
  116. 116. <section id="new-section"> <!-- Code for new section--> </div> <section id="original-section"> <!-- Code for original section--> </section> Let’s say you were adding a new section to your website.
  117. 117. <% if toggles.enabled("new-section") %> <section id="new-section"> <!-- Code for new section--> </div> <% end %> <section id="original-section"> <!-- Code for original section--> </section> Wrap new code in a conditional that looks up a feature toggle
  118. 118. <% if toggles.enabled("new-section") %> <section id="new-section"> <!-- Code for new section--> </div> <% end %> <section id="original-section"> <!-- Code for original section--> </section> Toggles are off by default, so users won’t see unfinished work
  119. 119. development: feature_toggles: new-section: true production: feature_toggles: new-section: false You can enable feature toggles in a config file.
  120. 120. > curl http://feature.toggles/ { "development": { "new-section": true }, "production": { "new-section": false } } Or you could create a web service for feature toggles.
  121. 121. > curl http://feature.toggles/?user=123 { "development": { "new-section": "A" }, "production": { "new-section": "B" } } It could return different, complex values for each user.
  122. 122. And provide a web UI for configuring toggles.
  123. 123. This allows you to quickly turn features on or off.
  124. 124. <% if toggles.get("new-section") == "A" %> <section id="new-section-bucket-a"> <!-- Code for new section, version A --> </div> <% elsif toggles.get("new-section") == "B" %> <section id="new-section-bucket-b"> <!-- Code for new section, version B --> </div> <% end %> This allows A/B testing
  125. 125. 1. Brakes 2. Bulkheads 3. Autopilot 4. Safety catch 5. Speedometer 6. Warning lights 7. Seat belt Outline
  126. 126. A speedometer tells you how fast you’re driving
  127. 127. Monitoring tells you how your product is performing
  128. 128. “If you can’t measure it, you can’t fix it.” – David Henke
  129. 129. There are many types of monitoring
  130. 130. Availability metrics: is my product up or down?
  131. 131. Useful tools: Keynote, Pingdom, Uptime Robot, Route53
  132. 132. Business metrics: what are my users doing in the product?
  133. 133. Useful tools: Google Analytics, KISSMetrics, Mixpanel
  134. 134. Application metrics: how is my application performing?
  135. 135. Useful tools: New Relic, CloudWatch, Datadog
  136. 136. 127.0.0.1 - - [10/Oct/2000:13:55:36] "GET /apache_pb.gif HTTP/1.0" 200 2326 64.242.88.10 - - [07/Mar/2004:16:05:49] "GET /twiki/bin/ HTTP/1.1" 401 12846 127.0.0.1 - - [28/Jul/2006:10:22:04] "GET / HTTP/1.0" 200 2216 64.242.88.10 - - [07/Mar/2004:16:06:51] "GET /twiki/bin/Twiki/" 200 4523 64.242.88.10 - - [07/Mar/2004:16:10:02] "GET /mailman HTTP/1.1" 200 6291 127.0.0.1 - - [28/Jul/2006:10:27:32] "GET /hidden/ HTTP/1.0" 404 7218 192.168.2.20 - - [28/Jul/2006:10:27:10] "GET /cgi-bin/try HTTP/1.0" 200 3395 64.242.88.10 - - [07/Mar/2004:16:11:58] "GET /twiki/bin/view/" 200 7352 64.242.88.10 - - [07/Mar/2004:16:20:55] "GET /twiki HTTP/1.1" 200 5253 Log files are also a form of application-level monitoring
  137. 137. 127.0.0.1 - - [10/Oct/2000:13:55:36] "GET /apache_pb.gif HTTP/1.0" 200 2326 64.242.88.10 - - [07/Mar/2004:16:05:49] "GET /twiki/bin/ HTTP/1.1" 401 12846 127.0.0.1 - - [28/Jul/2006:10:22:04] "GET / HTTP/1.0" 200 2216 64.242.88.10 - - [07/Mar/2004:16:06:51] "GET /twiki/bin/Twiki/" 200 4523 64.242.88.10 - - [07/Mar/2004:16:10:02] "GET /mailman HTTP/1.1" 200 6291 127.0.0.1 - - [28/Jul/2006:10:27:32] "GET /hidden/ HTTP/1.0" 404 7218 192.168.2.20 - - [28/Jul/2006:10:27:10] "GET /cgi-bin/try HTTP/1.0" 200 3395 64.242.88.10 - - [07/Mar/2004:16:11:58] "GET /twiki/bin/view/" 200 7352 64.242.88.10 - - [07/Mar/2004:16:20:55] "GET /twiki HTTP/1.1" 200 5253 Useful tools: loggly, logstash, Papertrail, Sumo Logic
  138. 138. Server metrics: how is my server performing?
  139. 139. Useful tools: Nagios, Icinga, Munin, collectd, CloudWatch
  140. 140. 1. Brakes 2. Bulkheads 3. Autopilot 4. Safety catch 5. Speedometer 6. Warning lights 7. Seat belt Outline
  141. 141. Warning lights notify you if something is wrong
  142. 142. Alerting systems notify you if something is wrong
  143. 143. You can’t look at metrics 24/7. Alerting systems can.
  144. 144. Useful tools: PagerDuty, VictorOps
  145. 145. For a full list of monitoring and alerting tools, see: hello-startup.net/resources
  146. 146. 1. Brakes 2. Bulkheads 3. Autopilot 4. Safety catch 5. Speedometer 6. Warning lights 7. Seat belt Outline
  147. 147. Seat belts help you survive crashes
  148. 148. High availability helps you survive crashes
  149. 149. Stateless servers: multiple instances, multiple zones
  150. 150. Load balancer routes around server or zone outages
  151. 151. Auto-recovery mechanism brings server back after outage
  152. 152. Stateful servers: multiple instances, multiple zones
  153. 153. Replication to one or more standby servers
  154. 154. Load balancer switches to standby server in case of outage
  155. 155. Auto-recovery mechanism brings server back after outage
  156. 156. Test your recovery process regularly.
  157. 157. 1. Brakes 2. Bulkheads 3. Autopilot 4. Safety catch 5. Speedometer 6. Warning lights 7. Seat belt Outline
  158. 158. Speed is limited by safety
  159. 159. Two cars can drive at 80mph in opposite directions safely…
  160. 160. Because of two yellow lines
  161. 161. It’s worth the time to put these safety mechanisms in place
  162. 162. For more info, see Hello, Startup hello-startup.net
  163. 163. Questions?
  164. 164. F1 racecar: Takayuki Suzuki Highway traffic: Oran Viriyincy Car accident: ER24 EMS (Pty) Ltd. Road: Nicolas Raymond BWM: Andy Durst Self-driving car: Steve Jurvetson Bus: Roland Tanglao Tail lights: Tony Webster USS South Dakota: Wikimedia Crash test dummy: Wikimedia Elisha Otis: Wikimedia Otis Elevator: Wikimedia Speedometer: Dawn Hopkins Dashboard lights: Jim Larrison Seat belt: Wikimedia Google repo stats: Rachel Potvin ISS: Wikimedia Fire: Pete Martin Fowler: Wikimedia Image credits

×