This document outlines an introduction to unit testing presentation. The presentation covers the reasoning behind unit testing, what unit testing is, benefits like better design and reduced costs, and how to write better tests. It also discusses test anatomy, demos writing tests, digging deeper into testing strategies, what to test, test coverage, and types of unit tests.
5. What is Unit Testing?
“In computer programming, unit testing is a method by which
individual units of source code are tested to determine if they
are fit for use. A unit is the smallest testable part of an
application” – Wikipedia
(emphasis added)
6. Why Unit test?
• Better Design
• Easier change
• Living documentation
• Reduces cost
Why do we test things?Integration testingUA testingStress testing
The idea is to write small bits of code that exercise your production code in specific ways.
Drives higher fidelity requirements.Growing suite of smoke testsRefactor confidence Bug Regression testing – bug found, create test that replicates bug, fix the bug, test should pass, test lives on as regression test.Requirement change “tripwire” – notifies if changes (planned or unplanned) causes test failure“Bug found earlier costs less” – time spent writing tests comes “from” time spent on debugging.
Arrange, Act, Assert (AAA)Get everything setupPerform the action you’re interested inVerify the results match the expectations
The pithy answer is everything. Which is terrible advice. Everything coverage may be justified for you and your business, but unless you are designing life support systems for NASA, I doubt it. Everything coverage is high cost (time) and slows development. If the risks warrant that level of mitigation, so be it, but don’t bite that off lightly.Should you test the framework? Again, maybe, but probably not.I feel very comfortable saying that anything you use from a System.* dll has already been better tested than1. Anything you’ll need2. Anything you could add.3. Any code you’ve already written.I’m not saying they’re bug free. I’m saying that the likelihood that you’ll find a bug is low and the likelihood that you’d find that bug in a unit test is even lower.
Where do you think we start for greenfield?New CodeExisting Code
Test as you goTest Driven Development (TDD)Behavior Driven Development (BDD)
Test as you go! - Bug tests- Touched code gets tests from now on (new dev or updates)Test the important bits. Bang for your buck.But what is important?
At every level of a project, unit test coverage should be directly proportional to the likelihood of change, and the criticality of the system.This isn’t a math equation or strict rule, but you can think of it like one. Zero chance of change shouldn’t mean you never test a critical system. It is just a feeling you can use. Importance - Do you have a core subsystem that literally everything else depends on? Unit test the living daylights out of it. Likelihood of change - Do you have a critical input that comes from a third party outside your influence/control? Test it to death. - Is there an oft-updated bit that is constantly evolving due to user feedback?You can even invent a quick scoring system if you’d like. But the point is start with the meaty bits of the solution, project, subsystem, class, etc. and work from there.
Quantity is not Quality – Quality over coverageCovered code != well tested codeCode Coverage metricOnly useful for highlighting untested codeNOT a measure of progress/completion/confidenceUsing Test Coverage as anything gets the logic wrong. Logic is one way.BMI – obese person will have high BMI, but high BMI doesn’t mean obese.
Indirect input – anything provided to a SUT that isn’t a parameter in the invocation (e.g. database records)Indirect output – anything provided BY a SUT that isn’t returned to the caller.State verification –”at the end”. Inspect state after SUT is exercised and compare to expectation. e.g. Asserting that the DeletedDate property is set after you soft-delete an object.Behavior Verification – asserting that something happens. Especially useful for indirect outputs, void returns, etc. capture the indirect outputs as they occur and compare them to expected behavior. Example, Asserting that the SaveChanges() method was called once and only once as part of the aforementioned soft-delete.Delta – form of state verification – comparing pre- and post- state. Asserting a certain change occurred.Guard – Fails the test if a condition isn’t satisfied. For example, a Guard assertion could fail a test if an If statement branch you didn’t want exercised was called.
Sweet Spot A – this is pushing your code-level design and principles, SRP, DRY, YAGNI, etc.Sweet Spot B – This is your regression testing, behavior assertion, etc.
Positive Testing - Does this produce the expected output when given good inputs?Negative Testing - Does this produce the expected output when given bad inputs?Exception Testing: Does this fail gracefully if an exception is encountered? Most important for data integrity issues.Boundary Testing: Pushing to the limits. Stress/Performance testing, full Unicode character sets, max string length, etc.Bug Testing: Write tests (may be integration tests) that recreate a bug, then fix the code until the bug goes away. Test lives on as a regression test.
No Conditionals (“if” or “switch”)No loops (“do”, “while”, “for”, “foreach”)No Exception catchingTo test code branches means multiple testsTesting Exception behavior is different
We’re testing that the HasMultipleAccountsMethod is working properly.How do you run this twice?If it fails, which condition did it fail?What controls if the instance has multiple accounts?
We control the inputs, so we can be sure that the conditions are the same for every execution.Different branches are tested with different test vector.
Same results every time, assuming no code changed.Avoid things like Random,DateTime.Now()Avoid External systems.Consistent - If you run it 1000 times, it should always give you the same result. Different configuration files, different machines, etc. This is tempting. It seems like using Random would force you to write tests that truly flex the system. You can’t write a test that only works for the narrow dataset you give it if you’re using random. But Random truly is Random.
Will fail ~20% of the time.
In unit tests, readability is king. In production code, sometimes you have to make decisions that hurt readability (e.g. performance tuning).But unit tests always, always, always err on the side of readability. If you have bad performing unit tests, you have other issues. You don’t tune a unit tests.Remember, future-you will want different information than present you. You want to see different info when you’re getting it working for the first time, when it is passing, when it breaks, when you refactor… etc.Self-DescriptiveReadability is kingUnit tests are living documentationDocs for developersBehavior specifications which are always up to date (unlike comments).Includes:Test organizationNaming (classes, methods, variables, etc.)Simple CodeInformative assertion messages
Atomic – Keep Tests SmallOnly two possible results: Pass or Fail. No partial successes.If you need to run the debugger and step into a unit test to figure out why/where a test is failing, yourtest probably isn’t atomic.No more than a handful of assertions. Ideally one.Avoid gobs of assertions in your tests, as this leads to needing the aforementioned debugger.Note that this does not preclude using data tests where the same method is executed repeatedly by the framework with different inputs.
The static factory method is creating randomized instances, so for this test we have a conditional loop to keep creating them until we get the kind of random instance we want.Is this going to be fast? Maybe, maybe not. Depends on the number of loops we need. Is this self-descriptive? Not in the slightest. What is this testing?How about Deterministic? Will this run the same time every time? Sort of. The while loop should ensure that the rest of the method passes, but the exact path will vary on every execution. Is it Atomic? Sort of. This is a fairly small test, but you can easily imagine a more convoluted setup where you have to setup a lot of dependencies and whatnot. I would say the number of assertions violate the atomic principal. You’ve got 5 assertions on two objects.If this fails, will you immediately no why? Ideally you shouldn’t even need to look at the test to know what happened. If the test is properly descriptiveMultiple assertions are okay so long as the test preserves…
Each test is responsible for one scenario onlyMay need multiple assertions to fully verify.Balance with Simple characteristic.Multiple assertions are fine so long as they verify the same behavior and don’t violate the atomicity. If you need the debugger to understand how/why a test failed, bad test.A single behavior that spans multiple methods (private methods, properties, etc.) can be tested with one test.A single method that has multiple behaviors should be tested with multiple tests.Wiki - “A unit is the smallest testable part of an application” NOT a single method or class.
Environment – this sort of extends consistentThe success/failure of a test shouldn’t depend on the state of an external system, like a database.Other TESTS - The success/failure of a test shouldn’t depend on other TESTSBe mindful of instance variables. Different frameworks do different things.Other classes – isolate from dependencies. Mock dependencies. we’ll come back to this.A moment on ordered tests.You can create ordered tests, in which a prescribed execution order is preserved. This can very easily lead to some bad tests. I’ve seen it used once in a way that may be acceptable – to bridge atomicity with SRP. A series of tests for a single behavior were ordered so that their assertions worked progressively deeper into the behavior and each test was very atomic and self-descriptive, yet all were testing a single behavior. The Ordered test preserved the narrativeYou can also argue that the behavior being tested should be refactored so this wasn’t needed, in which case ordered tests really only belong in integration testing.
Never deployed to productionNo “test hooks”You will see .ctor overloads for testing. bad.
ExecutionEvaluationSummarizationResults distributionIf it is a manual process, it won’t happen.Gated buildsRun tests on buildsEtc.Let the framework do the lifting for you.Results – lot of options here. RSS, email, SMS, TFS work items, whatever your process is, automate it.
Test suites should be kept “per business module”Tests follow the tested code If you share a project between solutions, the tests should go with that project and be executable without other projects.
Indirect input – anything provided to a SUT that isn’t a parameter in the invocation (e.g. database records)Indirect output – anything provided BY a SUT that isn’t returned to the caller.Eliminates testing/environment/experiment variables, not programming variables.
Everyone has different terms.xUnit Patterns Terminology http://xunitpatterns.com/Mocks,%20Fakes,%20Stubs%20and%20Dummies.htmlDummies are never actually used. Just passed around filling out parameter lists. Passing null to a param that isn’t used in a particular unit is a dummy. A better practice for a Dummy is throw exceptions on every method so you can ensure it isn’t invoked by your SUT. * no behavior, never called * no indirect input or outputStubs – verify indirect inputs. Usually are hardcoded or configured in the test to return the same responses regardless of the SUT’s input. Ignores indirect outputSpy – Verify indirect output. Captures indirect output for later verification. May optionally provide indirect inputs.Fakes have working implementations, but do something that makes them not-production code. Like using an in-memory database. They do not offer a control point to the test. May be stateful. * no indirect input, uses indirect output,Mocks are useful for verifying behavior, which is important for indirect outputs. Something like a method that saves changes to the database and returns void has indirect outputs. A classic example is calling to a logger when an exception is thrown. With Mocks, you setup up expectations and then verify they are met. * Can provide indirect input, verifies correctness against expectations.
There are lots of frameworks with lots of differences. You’ll need to find the one that works for you and your team.I’ve been using Moq for a couple of years. A couple of the reasons I like it: Type safe, lambda syntax, no record/replay paradigm. Those features have become a lot less unique in the subsequent years, but I’ve stuck with Moq because I know it and it hasn’t given me any problems.Moq (pronounced “mock” or “mock-you”) is a test doubling framework, and it can be used to create/configure all 4 test doubles we just talked about.This, along with my tendency to refer to all doubles as mocks, creates some confusion. If you’re not clear on it or I misspeak, let me know and I’ll clarify.
Classic trends towards stubs/fakes/dummies because they use state verification. Doesn’t care about implementation or behaviors, only final state.Mockist testing tends to mock everything. Pros/consClassic fakes/stubs take time to setup, and are reused by a lot of tests. Thus bugs in the fake/stub implementation can be tough to track down, far reaching, and mask real production bugs.Mocks can get complicated to setup properly, and are typically done so per-test. They are also inherently more implementation driven, which couples them more tightly to the code they test.