The document discusses software testing and how to prevent defects. It recommends implementing various types of tests at different stages, including unit tests, integration tests, UI tests, system tests, and manual tests. The faster a test can run, the more often it should be run. Tests should run in parallel and be distributed to improve efficiency. Flaky tests waste time and hurt trust in the test suite, so they must be addressed promptly. Writing automated tests of various granularities helps enable fast development cycles and prevents regressions.
2. Airtable Tech Lead, Storage and Caching
Scale AI First staff-level engineering hire
Cloudera Tech Lead, HDFS and ML platform
Berkeley CS PhD track, distributed systems
UVa B.S. Computer Science
About Me
2
8. 8
Personal experiences
● Incorrectly parsing an old version of a file format, producing an
erroneous empty result
● Under-calculating how much data to flush to disk
● Full site outage caused by a rogue query, followed by broad database
corruption from bad database restart procedure
● A “save” button that would almost always throw a 500
9. How to prevent software defects?
9
● Typechecker
● Static analysis
● Unit tests
● Integration tests
● System tests
● UI tests
● Manual tests
● Performance tests
● Canary tests
● ….and more!
10. A. Compiler error?
B. Unit test failure?
C. Manual QA issue?
D. Customer issue?
Write your answers in chat!
10
1 second
1 minute
2 hours
5-10 hours
Time to fix a...
14. General principles
● Most of your test coverage should be fast and easy to run
● Write automated tests
● Write tests with different granularity
14
15. Unit tests
● Most granular type of testing
● Testing a single function, class, or component
● Narrow scope makes it easy to identify and isolate bugs
● Run fast (1 second)
15
16. Integration tests
● Tests multiple components together
● Multiple threads, processes, DBs, filesystem, etc
● Run in 10-100 seconds
16
17. UI tests
● Golden age of frontend
development
● React is pretty testable
● Cypress is awesome
17
18. System tests
● Testing multiple services in a realistic environment
● Full end-to-end customer workflows
○ Create a resource
○ Use the resource
○ Delete it
● Tests things that are expensive or limited
○ Uses something that you only have one of
○ Calling external services
○ Expensive operations
18
19. Manual tests
● Most flexible but also most expensive and slowest
● Less necessary these days, because of great testing libraries
● Generally want to avoid if possible
● Exceptions
○ During development
○ When there’s a site incident
○ The functionality is rarely used
○ The setup overhead is just too high (for now)
19
20. Continuous Integration
● Test every change
● Run different tests at different times, based on cost/speed
● Detect and identify bugs as early as possible
20
21. Continuous Integration
21
Stage Additional Tests Run
Pre-commit Unit + integration tests
Post-commit UI tests
Nightly System tests
Staging Manual tests
Canary Live user testing
22. Continuous Integration
● The faster the test suite, the more often you can run it
● My rule of thumb: getting a cup of coffee ☕
● Run tests in parallel and distributed
○ https://www.umbrant.com/2016/08/25/distributed-testing/
○ 60x improvement for Hadoop’s test suite, 8.5 hours -> 8 minutes
● Testing can be 💰💰💰, but are generally worth it
○ $100s/mo per developer
22
23. Flaky tests
● Tests that spuriously fail x% of the time
● Can waste a lot of time triaging failures and retrying builds
● Kills trust in the test suite!
● Strategies
○ Temporarily disable flaky tests and fix with urgency
○ Make a dashboard of flaky rate per test
○ Track test flakiness over time to help bisect the suspect commit
23
24. Why do tests flake?
● Timing dependencies in multi-threaded applications
○ time.sleep() is a code smell
○ Use barriers/locks/condition variables instead
○ Use a FakeTicker class to advance system time
● Calling external services
○ Just don’t!
○ Spy your HTTP/RPC libraries to detect errant network calls
● Leaked global state
○ Run tests individually in isolation
○ Run tests in a deterministic random order
○ Don’t use statics
24
25. Why tests are a developer’s best friend
● Fast Develop -> Test -> Debug loop
● Demonstrates that the code works
● Acts as a contract for the behavior of the code
○ Prevents other people from breaking your code
● Lets you fearlessly refactor the codebase
○ Prevents you from breaking other people’s code
25
26. What we didn’t cover
● Code review
● Design review
● Deploy process
● Monitoring and alerting
● Feature flags
26
27. Takeaway
● Write tests
● Write automated tests
● Write different kinds of tests
● Run your tests often
● Make the test suite fast
27
28. Resources
● Martin Fowler’s site: https://martinfowler.com/testing/
● JUnit docs: https://junit.org/junit5/docs/current/user-guide/#writing-tests
● Google Testing Blog: https://testing.googleblog.com/
● Uber: Keeping master green at scale
https://eng.uber.com/research/keeping-master-green-at-scale/
● Cindy Sridharan: Testing in Production, the safe way
https://copyconstruct.medium.com/testing-in-production-the-safe-way-18ca10
2d0ef1
● Automating safe, hands-off deployments (AWS):
https://aws.amazon.com/builders-library/automating-safe-hands-off-deploym
ents/
28