A collection of workshops on
- CD pipeline architecture and design tactics for testability quality factor
- technical practices - tips for team up-skilling
- TDD sources and materials
3. These slides contain a wide intro to automated testing in CD pipelines -
We’ll look at the types of xUnit tests we write, and how they help us model system specification -
- Unit testing and TDD
- Integration testing and PACT (CDC)
- Acceptance testing and ATDD
We’ll also touch on
- Team up-skilling tips
- Architectural challenges
- Suggested materials on automated testing
- Debates around TDD is included for reference, but is another talk in itself!
SCOPE
3
4. Neal Ford describes Continuous Delivery as the ‘authoritative,
repeatable, automated checking of production readiness’ [Ford,
2015]
TESTABILITY QUALITY FACTOR AND CD
4
5. [DZone, 2015] do a very useful introduction, ‘CD Visualised’, which
covers the testing technologies (both automated and manual)
used within CD pipelines [pdf]
We will concentrate on xUnit tests run within commit and
automated acceptance testing stages in the pipeline.
CD OVERVIEW
5
7. [Cohn, 2010] chapter 16 introduces the test
automation pyramid
– We use this metaphor to describe the various
approaches to automated testing
– Cohn was drawing a contrast between Agile (test-
early) and waterfall (test-late) approaches
– Many projects had become reliant upon large
suites of slow, brittle UI automation tests, written
after development work was complete
THE COHN TEST AUTOMATION PYRAMID
7
“One reason teams found it difficult to write tests sooner
was because they were automating at the wrong level. An
effective test automation strategy calls for automating
tests at three different levels”
8. Watirmelon – ice cream cone anti-pattern
- Various visual metaphors
- Implies the concepts in xUnit are lost
- Late defects are generally more costly
TEST ANTI-PATTERN METAPHORS
8
10. “Inspection does not improve the quality, nor guarantee quality.
Inspection is too late. The quality, good or bad, is already in the
product. As Harold F. Dodge said, ‘You can not inspect quality
into a product.’”
[Deming, 1986]
BUILD QUALITY IN
10
11. TEST SCOPE - EXAMPLE SYSTEM
Do we test the entire system, fully, end to end?
11
12. PIPELINE STEPS
• Code quality (lint)
• Build / transpile
• Unit tests
• Integration tests (sandbox)
• Deploy to E2E runtime
• Journey tests against E2E
12
14. We prefer unit testing because -
reliability - 100% deterministic
dependencies - simple, components
scope - single class or component
isolation - tests for this component do not affect others
mocks - use a spy to validate actions are called
UNIT TESTS
14
16. PACT (CONSUMER DRIVEN CONTRACTS)
The design of the pipeline and PACT repository represents a movement towards ‘architecture for specification’. Specification is a
key architectural concern, critical to pipeline quality factors.
16
17. This is a valid tactic when faced with legacy core systems. Can the interceptor validate its stub content against
the real system? (Probably not in a repeatable manner)
SMART STUBS CAN MODEL COMBINATIONS OF FAILURE
17
18. PACT SPECIFICATION (CONSUMER DRIVEN CONTRACT)
{
"provider": {
"name": "Animal Service"
},
"consumer": {
"name": "Zoo App"
},
"interactions": [
{
"description": "a request for an alligator",
"provider_state": "there is an alligator named Mary",
"request": {
"method": "get",
"path": "/alligators/Mary",
"headers": {
"Accept": "application/json"
}
},
"response": {
"status": 200,
"headers": {
"Content-Type": "application/json;charset=utf-8"
},
"body": {
"name": "Mary"
}
}
},
18
{
"description": "a request for an alligator",
"provider_state": "there is not an alligator named Mary",
"request": {
"method": "get",
"path": "/alligators/Mary",
"headers": {
"Accept": "application/json"
}
},
"response": {
"status": 404
}
},
21. Neal Ford describes Continuous Delivery as the ‘authoritative,
repeatable, automated checking of production readiness’ [Ford,
2015].
In essence, the quality factor of pipelines is that they must be able to
repeatedly and reliability demonstrate that our software is production
ready.
TESTABILITY QUALITY FACTOR AND CD
21
22. DESIGN HEURISTIC
“… for a system to meet its acceptance criteria to the satisfaction of
all parties, it must be architected, designed and built to do so - no
more and no less.”
[Rechtin, 1991]
22
23. Neal Ford argues that -
• Architects should be responsible for constructing the deployment pipeline
• It is an architectural concern to decide the number of stages for the
deployment pipeline
Continuous Delivery for Architects [Ford, 2014]
PIPELINE AS AN ARCHITECTURAL CONCERN
23
24. TESTABILITY PAYOFF
“Industry estimates indicate that between 30 and 50 percent (or in some
cases, even more) of the cost of developing well-engineered systems is
taken up by testing. If the software architect can reduce this cost, the
payoff is large.”
[Bass et al, 2013]
24
25. [IEEE 90] defines testability as -
• The degree to which a system or component facilitates the establishment of
test criteria and the performance of tests to determine whether those criteria
have been met
• The degree to which a requirement is stated in terms that permit
establishment of test criteria and performance of tests to determine whether
those criteria have been met
TESTABILITY QUALITY FACTOR AND CD
25
28. FOWLER ON NONDETERMINISM
• In order to get tests to run reliably, we must have clear control over the system state at the
beginning of the test
• Some people are firmly against using test doubles in functional tests, believing that you must
test with real connection in order to ensure end-to-end behaviour
• However, automated tests are useless if they are non-deterministic. Any advantage you
gain by tested to the real system is negated by non-determinism
• Often remote systems don't have test systems we can call, which means hitting a live system. If
there is a test system, it may not be stable enough to provide deterministic responses.
http://martinfowler.com/articles/nonDeterminism.html
28
31. Generally, we can think of a test (a specification) as a single state machine sequence
IDEAL TEST (SPECIFICATION)
31
32. Goal is to construct systems from
high-quality components
Coarse-grained tests are
- more complex
- have more dependencies
- are harder to understand
- are harder to write
- provide poor defect localisation
- must model more states
BUILD QUALITY IN / SOLID
32
33. “Setting and examining a program's internal state is an aspect
of testing that will figure predominantly in our tactics for
testability”
[Bass et al, 2013]
TESTABILITY TACTICS
33
34. BASS ET AL - TESTABILITY QUALITY FACTOR AND TACTICS
34
35. HOLISTIC SYSTEM DESIGN APPROACH
“The test setup for a system is itself a system” [Rechtin, 1991]
35
37. PIPELINE REPEATABILITY AND RELIABILITY TACTICS
- Test in isolation (depend on specification, not an actual system)
- Systems at boundary should expose canonical stubs or test harnesses
- Most downtime relates to external system outages. Shared (critical) services should
utilise blue/green deploy to minimize downstream impact
- The repeatability of acceptance tests concerns nondeterminism, and the repeatability of
state scenarios at our boundary
- Apply Record and Replay stubbing approaches to make legacy connected systems
repeatable, as make specification amenable to source control
37
39. PIPELINES AND LEGACY ARCHITECTURE
Legacy architecture presents several challenges -
• APIs and services expose consumers to fully-connected, end-to-end shared
environments; depending on the nature of a services' underlying state, tests were
not repeatable, and availability was low
• Integration tests written against shared fixtures were typically brittle, slow, and
exhibited low levels of verification, limiting their value to a simple litmus test of "the
service is on”
• Shared data was frequently cannibalised by external teams, causing integration
tests within pipelines to fail
39
40. PIPELINES AND LEGACY ARCHITECTURE
• Even though some service teams had payed close attention to architectural guidelines,
those guidelines did not cover testability
• There was a ‘release mismatch’ across silos, where teams had conflicting technical
practices(e.g. it isn’t possible to push into master if another team is using long-lived
branching, rather than feature switching)
• Specification approaches were ad-hoc and fragmented; no-one could say ‘lets run the new
scenario for all 17 types of account’
• Specification systems were a by-product, not designed, so presented a barrier to the
addition of new types of product
• Some tests applied nUnit, some used SOAP UI, some relied on intercepting active fake
systems
40
41. • Include developers and QA’s
• Pick a code kata
• Provide a free-lunch incentive!
• Break down pairing barriers
• Build deeper understanding by practice
• Cover concepts like test-first, first-gear
TDD, evolutionary design, SOLID, GitHub
Flow and Feature Switching
DOJOS
41
42. • Dojos drive technical practices, but also
break down fear barriers
• Day-to-day, spread knowledge by pairing
more experienced developers with less
experienced developers
• Developers and QAs pair to meet
acceptance criteria, pushing testing down
the pyramid where more extensive
coverage is required
• Collective code ownership
• Deming - focus on quality and
craftsmanship / drive out fear!
TECHNICAL PRACTICES
42
43. [Cohn, 2010] key takeaways
- focuses on continuous improvement of engineering practices
- Testing should be a whole-team responsibility, it should not be
delegated to ‘experts’ in the testing field
- Use the innate skills within the team to solve quality problems
“… the tester creates automated tests and the programmer programs.
When both are done the results are integrated. Although it may be
correct to still think of there being hand-offs between the programmer
and tester, in this case, the cycle should be so short that the hand-
offs are of insignificant size.”
TEAM EMBEDDING FOR TECHNICAL PRACTICES
43
Story Kick Off Story Handover
Shared understanding Minimizing bugs
•Unit Testing
•Integration Testing
•Work with QA to write Acceptance Test
•Drive Acceptance Testing
•Exploratory Testing
Writing Testable Stories
Product Owner signs off
“Avoid working in a micro-waterfall approach, with distinct analysis,
design, coding and testing phases within a sprint.”
“The hand-offs between programmers and testers (if they exist at all) will
be so small as not to be noticeable.”
“There should be as much test activity on the first day of a sprint as on
the last day.”
“Testers may be specifying test cases and preparing test data on the first
day and then executing automated tests on the last, but are equally busy
throughout.”
44. INVEST
Independent - self contained
Negotiable - can be changed until in play
Valuable - value for the end user
Estimatable* - well enough defined to be estimated
Small - easy to plan / prioritise
Testable - story must provide test criteria
*Not sure this is really a word, they just made it up
STORY WRITING FOR SPECIFICATION
44
45. Feature: Quick Balance
As a Customer
I would like to view my Quick Balance without having to login
So that I can check my available balance more quickly
STORY WRITING FOR SPECIFICATION
45
46. In order to construct an acceptance test, we need to define the criteria
Acceptance Criteria:
Given I am on the Home Screen and not logged in
And I have a valid Account
When I swipe left
Then I can view my Quick Balance
Given I am on the Home Screen and not logged in
And I have cancelled my Account
When I swipe left
Then I see an Error Message “Quick Balance not available.”
STORY WRITING FOR SPECIFICATION
46
47. LET’S CODE THAT FEATURE IN AN ATDD CYCLE
The cycle of growing software, guided by tests [Freeman-Pryce 2009; 40]. (I’ve added a Git
commit path☺)
47
48. ACCEPTANCE TEST (API TEST)
We add a failing acceptance test, to call the API and validate the response
[SetUp]
public void Setup()
{
client = new HttpClient();
client.DefaultRequestHeaders.Add("userToken", UserFixture.Token);
}
[Test(), Description(“Given a logged in user, Quick Balance returns AvailableBalance.")]
public void GivenUserLoggedIn_QuickBalance_ReturnsAvailableBalance()
{
var response = client.PostAsync(UrlHelper.QuickBalance), null).Result;
Assert.IsEqual(HttpStatusCode.OK, response.StatusCode);
Assert.IsNotNull(response);
var jsonResult = response.Content.ReadAsStringAsync().Result;
Assert.IsTrue(jsonResult.Contains("AvailableBalance"));
}
48
49. UNIT TEST
[SetUp]
public void Setup()
{
acctRepo = Substitute.For<IAccountRepository>(); // unit tests stub dependencies
identity = Substitute.For<IIdentity>();
}
[Test(), Description("Balance returns BadRe for invalid loginId")]
[TestCase(String.Empty)]
[TestCase(null)]
public void GetBalance_InvalidLoginId_ReturnsBadResponse(string invalidLoginId)
{
//ARRANGE
identity.LoginId.Returns(invalidLoginId);
//Repeated constructor is a DRY fail? Test is doing work of a container?
var subjectUnderTest = new QuickBalanceController(acctRepo, identity);
// ACT
var response = subjectUnderTest.GetBalance();
// ASSERT
Assert.AreEqual(HttpStatusCode.BadRequest, response.StatusCode);
}
49
52. Advanced Unit Testing - Mark Seeman [Seeman, 2013]
- Test readability and “DRY vs. DAMP”
- Red, Green, Refactor and trusting tests
- Simple coding guidelines for test readability
- SUT management and test fixture management patterns
Automated Testing: End to End – Jason Roberts [Roberts, 2013]
Basic economics of testing and the test pyramid
Unit testing (Module 2)
Integration testing (Module 3)
Team City pipeline design (Module 5)
TESTING MATERIALS
52
53. xUnit Test Patterns - Refactoring Test Code
Gerard Meszaros, Addison-Wesley, 2007 [xUnit Test Patterns]
Amazon
- The book is about the patterns used in the design of
software systems… it’s the book all architects and technical
leads should read!
- This book is to xUnit what the ‘Gang of Four’ book is to
object-oriented design
- Its content is available online http://xunitpatterns.com
- As its subtitle suggests – Refactoring Test Code –the book
details simple goals and patterns for incremental
improvements at code level
- BUT it also discusses wider scale architectural anti-
patterns, e.g. testing against shared databases.
TESTING MATERIALS
53
55. Growing Object-Oriented Software, Guided by
Tests
Steve Freeman and Nat Pryce, Addison-Wesley,
2009 [Freeman-Pryce 2009]
- Good reference on all aspects of testing
- Explains mocks and test in isolation
- ATDD
- End-to-end testing of event driven systems
TESTING MATERIALS
55
57. Use visible metrics to drive continuous
improvement of technical practices…
… but, over-reliance on metrics is an anti-
pattern, e.g. James Coplien - Why Most Unit
Testing is Waste
We should be confident that all acceptance
criteria have been met, and the risk of system
failure is low.
An interesting metric is cycle time, as well as
the number of defects found and fixed in dev,
test, staging and production environments.
TEST METRICS ARE USEFUL… BUT DANGEROUS
57
58. UNCLE BOB
Another well quoted adage of test driven development is the three rules of TDD –
1. You must write a failing test before you write any production code
2. You must not write more of a test than is sufficient to fail, or fail to
compile
3. You must not write more production code than is sufficient to make the
currently failing test pass
This is often interpreted as ‘first-gear’ TDD, with 100% test coverage. You would write a failing
test before you write any code that introduces new specifications or modifies existing
behaviour. You can use first-gear TDD if you want to… but it should be optional.
58
59. Ian Cooper - Where has TDD Gone Wrong? http://vimeo.com/68375232 [Cooper, 2013]
■ Discussion of Kent Beck, and how TDD has been misinterpreted by the community
■ Key issue is the over-coupling between implementation and tests (accidental over-specification)
■ Discussion of testing anti-patterns (‘ice cream cone’) and test pyramid (see 35.00)
■ Hexagonal architecture and unit testing at port boundaries, to produce tests that verify external
component specification (see 42.20)
■ Gears – suggestion that we may write finer tests to incrementally grow a complex algorithm and
then perhaps throw them away, as they are expensive to maintain [45.30]
■ We should retain only external specification tests [45.00]
■ We must focus on writing tests against behaviours (acceptance criteria), rather than method
level testing, i.e. test at correct level of granularity and focus on component specification
(engineering practice) [48.00]
■ Weaknesses of ATDD is that it relies on an engaged business stakeholder [51.00]
FIRST GEAR TDD AND ATDD HAVE THEIR DETRACTORS
59
60. TEST-DRIVEN DEVELOPMENT BY EXAMPLE (BECK)
Do we need to re-interpret Kent Beck?
- The original book on test-first development
- Focus on xUnit tools
- Code examples, and simple patterns and
techniques for test writing and refactoring
Test-Driven Development By Example
Kent Beck, Addison-Wesley, 2002
60
61. TEST-DRIVEN DEVELOPMENT BY EXAMPLE (BECK)
How large should your test steps be? Beck says…
- You could write tests to encourage a single line of code, or
- You could write tests to underpin hundreds of lines of code
Although test-driven approaches imply small increments, we
should be able to do either.
61
62. TEST-DRIVEN DEVELOPMENT BY EXAMPLE (BECK)
What don’t you have to test?
“Write tests until fear is transformed into boredom”
So, address risk.
62
63. TEST-DRIVEN DEVELOPMENT BY EXAMPLE (BECK)
What do you have to test?
- Conditionals
- Loops
- Operations
- Polymorphism
I think you should cover all variants of concept - it’s up to you whether you nail down
semantics to a microscopic level.
63
64. TEST-DRIVEN DEVELOPMENT BY EXAMPLE (BECK)
Can you drive development with application-level tests?
Beck says that the risk of small scale-testing (“unit testing”) is that
application behaviour may not be what users expect.
Application level testing has some advantages -
- Tests can be written by users
- Can be mixed with “programmer-level TDD”
64
65. TEST-DRIVEN DEVELOPMENT BY EXAMPLE (BECK)
When should you delete tests?
If a test is redundant, it can be deleted, but…
- Confidence - never delete a test if it reduces your confidence
in the system
- Communication - two tests exercising the same code path,
but covering different scenarios should remain
65
67. REFERENCES
[Bass et al, 2013] Software Architecture in Practice (3rd Edition) (SEI Series in Software Engineering), Len
Bass, Paul Clements, Rick Kazman, Addison-Wesley, 2013.
[Cohn, 2010] Succeeding With Agile, Software Development Using Scrum, Mike Cohn, Addison-Wesley,
2010.
[DZone, 2015] Continuous Delivery, Visualised <https://saucelabs.com/resources/white-papers/dzone-
continuous-deliver-guide.pdf>
[Meszaros, 2007] xUnit Test Patterns - Refactoring Test Code, Gerard Meszaros, Addison-Wesley, 2007.
[Ford, 2014] Continuous Delivery for Architects, Neal Ford
[Ford, 2015] Engineering Practices for Continuous Delivery, Neal Ford
[IEEE 1990] IEEE Computer Society. IEEE Standard Computer Dictionary: A Compilation of IEEE Standard
Computer Glossaries, 610. New York, N.Y.: Institute of Electrical and Electronics Engineers, 1990.
[Rechtin, 1991] Systems Architecting, Creating and Building Complex Systems, Eberhardt Rechtin, Prentice
Hall, 1991.
[Uncle Bob, 2014] Bob Martin – the Cycles of TDD <http://blog.cleancoder.com/uncle-
bob/2014/12/17/TheCyclesOfTDD.html>
67
Won’t look at BDD or WebDriver, in-production testing, sorry
Won’t look at BDD or WebDriver, in-production testing, sorry
[Ford, 2015] Engineering Practices for Continuous Delivery, Neal Ford [pdf]
http://nealford.com/downloads/Continuous_Delivery_1of3_Deployment_Pipelines_Neal_Ford.pdf
Won’t look at BDD or WebDriver, Production metrics sorry
This could be a unit test … or a system test
Setup - how do we get to state A?
We prefer PACT consumer-driven approaches to conventional integration testing. This emphasises early fault detection, as opposed to late, brittle or defensive approaches.
Run system A against system B
Pact is recorded - it becomes a stub for A and a spec for B
It’s not the only way - you could share tests, for example.
Active stubs can simulate unusual combinations of failure
but late testing
compare to PACT (a local stub + specification)
what checks the stub? Integration Contract Tests?
Prior to PACT, one approach would be to have a system generate consumer stubs based upon tests driven by shared specification, and push this into source control (where they can be diffed)
We don’t go into Provider States (“go to state A”)
Journey tests are more brittle
Value in checking E2E is all available
May use stubs
[Ford, 2015] Engineering Practices for Continuous Delivery, Neal Ford [pdf]
http://nealford.com/downloads/Continuous_Delivery_1of3_Deployment_Pipelines_Neal_Ford.pdf
View tests as production code
Ford argues CD pipeline design
Q. How many architects focus on pipelines, or left to devs / test managers
first point is about system architecture
second point is about canonical specification
[IEEE 1990] IEEE Computer Society. IEEE Standard Computer Dictionary: A Compilation of IEEE Standard Computer Glossaries, 610. New York, N.Y.: Institute of Electrical and Electronics Engineers, 1990.
If we weigh repeatability and reliability on our system context, how does it fare? (NB. numbers are examples, not actual)z
If we weigh repeatability and reliability on our system context, how does it fare? (NB. numbers are examples, not actual)z
This could be a unit test … or a system test
Setup - how do we get to state A?
How do we check state B?
Do we test the entire system as a whole? The finer the granularity of state and dependencies, the simpler the tests
Decomposition and spec traceability are the challenges here
Bass - Software Architecture in Practice quality factors and tactics
Additional - Simian Army
Canonical specification
View tests as production code
Engineering - MTBF
Can supply a PACT to say ‘this is 8’
Can’t supply a non-existent PACT
Shared persistent fixture
Shared persistent fixture
Metrics can drive over-testing
On the job training / Deming’s 14 points
I Independent The user story should be self-contained, in a way that there is no inherent dependency on another user story.
N Negotiable User stories, up until they are part of an iteration, can always be changed and rewritten.
V Valuable A user story must deliver value to the end user.
E Estimable You must always be able to estimate the size of a user story.
S Small User stories should not be so big as to become impossible to plan/task/prioritize with a certain level of certainty.
T Testable The user story or its related description must provide the necessary information to make test development possible.
I Independent The user story should be self-contained, in a way that there is no inherent dependency on another user story.
N Negotiable User stories, up until they are part of an iteration, can always be changed and rewritten.
V Valuable A user story must deliver value to the end user.
E Estimatable You must always be able to estimate the size of a user story.
S Small User stories should not be so big as to become impossible to plan/task/prioritize with a certain level of certainty.
T Testable The user story or its related description must provide the necessary information to make test development possible.
These look a lot like sequential state machine assertions
We decompose acceptance criteria down onto unit tests
1. Issues with this validation - this type of loose assertion is indicative of some nondeterminism, e.g. due to the use of a shared persistent fixture.
2. It’s better if we could repeatedly, reliably assert “AvailableBalance exists and its value is $4.00”
Lets add a failing unit test… first we might define our boundary conditions, to underpin some obvious runtime assertions.
Fence in the problem! Be strict up front, test state should be an example of “canonical production state”
If we add checks last, there’s a chance that many tests have already strayed into invalid states, meaning they will all break if we add those stipulations later.
This means we must grow our fixtures to create valid state for us.
Recommended for .NET
Recommended for .NET
Recommended for .NET
Metrics can drive over-testing
Don’t tie it in to KPIs!?!
Highlights of the talk - it’s well worth checking out!