The DevOps principle of “Shifting Left” promotes testing early in the development cycle, for improved software quality and system health. At the same time, the rise of containerized microservice applications brings a new challenge: services are developed in isolation. It’s common practice that each service is frequently, thoroughly tested—individually. But they don’t get validated together until deploy time (if at all!). In this session, we’ll explore techniques for running high-fidelity integration tests across multiple services, as part of a continuous integration workflow. You'll see a demo that uses Jenkins to provision, test, and tear down self-contained Kubernetes environments that replicate complete production systems. This allows you to run full-system tests as part of every build, safely and cost effectively.
1. Fully test your microservices application, early and often
Dave Stanke, Developer Advocate at Google Cloud @davidstanke
Illustrations by Tracey Laguerre
Continuous
Integration Testing
27. @davidstankeContinuous Integration Testing
Technique 1: Docker Compose
Jenkins
Docker
Compose
Fidelity: ⬤ low - Isolation: ⬤ high - Scalability: ⬤ high - Ephemerality: ⬤ (depends) - Speed: ⬤ fast - Cost: ⬤ medium
Provision De-provision
28. @davidstankeContinuous Integration Testing
Technique 2: K8s Staging Cluster
Kubernetes Engine
Namespace Web DB
Namespace Web DB
Namespace Web DB
Jenkins
Fidelity: ⬤ high - Isolation: ⬤ medium - Scalability: ⬤ highest - Ephemerality: ⬤ variable - Speed: ⬤ fast - Cost: ⬤ low
29. @davidstankeContinuous Integration Testing
Technique 3: Single-node “K8s”
Jenkins
Fidelity: ⬤ medium-high - Isolation: ⬤ high - Scalability: ⬤ high - Ephemerality: ⬤ variable - Speed: ⬤ medium - Cost: ⬤ medium
Provision De-provision
30. @davidstankeContinuous Integration Testing
Technique 4: K8s cluster per test
Kubernetes Engine
cluster: `test-123`
Web DB
gcloud gcloud
Jenkins
Fidelity: ⬤ highest - Isolation: ⬤ high - Scalability: ⬤ (?) - Ephemerality: ⬤ variable - Speed: ⬤ slow - Cost: ⬤ high
Provision De-provision
41. @davidstankeContinuous Integration Testing
Integration testing techniques
Method Fidelity Isolation Scalability Ephemerality Speed Cost
docker-compose ⬤ low ⬤ high ⬤ high ⬤ (depends) ⬤ fast ⬤ medium
Shared “staging” K8s ⬤ high ⬤ medium ⬤ highest ⬤ variable ⬤ fast ⬤ low
Single-node “k8s” ⬤ medium-high ⬤ high ⬤ high ⬤ variable ⬤ medium ⬤ medium
K8s per test ⬤ highest ⬤ high ? ⬤ variable ⬤ slow ⬤ high
42. @davidstankeContinuous Integration Testing
Integration testing techniques
Method Fidelity Isolation Scalability Ephemerality Speed Cost
docker-compose ⬤ low ⬤ high ⬤ high ⬤ (depends) ⬤ fast ⬤ medium
Shared “staging” K8s ⬤ high ⬤ medium ⬤ high ⬤ variable ⬤ fast ⬤ low
Single-node “k8s” ⬤ medium-high ⬤ high ⬤ high ⬤ variable ⬤ medium ⬤ medium
K8s per test ⬤ highest ⬤ high ? ⬤ variable ⬤ slow ⬤ high
45. @davidstankeContinuous Integration Testing
Yes, we can (still) do integration testing
There are trade offs between different ways of doing it
It’s all about risk tolerance vs. reward of speed/cost/etc.
You have to find your own sweet spot. (When you do, tell me about it!)
@davidstanke
Hinweis der Redaktion
My name is Dave
I’m a developer advocate with Google Cloud Platform. My area of focus is the DevOps community. I spend my time listening to practitioners, working to understand your challenges, and your solutions, and your ambitions. And I share that knowledge throughout the community, and bring it back home to my colleagues at Google.
You’ve probably heard the phrase shift left. What’s that mean?
We envision software development moving from left to right. Shifting left is to incorporate critical processes early in the cycle
Why?
Because software gets more complex as it moves from “left” (dev) to “right” (prod)
Which means the cost of addressing problems gets exponentially larger
We can shift left on lots of things
Code merges (the original “continuous integration”)
Security
Testing
research done by NIST, IBM, and others:
A bug found in production costs many times more to fix than that same bug found during development. 6X 30X? 100X?
Why is that? I can think of a couple reasons:1) when you catch it early, it hasn’t had time to cause any downstream effects. And 2) if a developer finds a bug in code that’s still in their active working consciousness, they don’t have to remember what was going on when they wrote that code. When you’re still in the flow, you can fix faster.
So shifting left makes developers happier. It also makes CEOs happier.
DORA: Continuous testing predicts improved business outcomes
What do we test
Here is the test pyramid
At the bottom, lots of little unit tests, that run fast and find issues cheaply
Then, integration tests, where we combine components and see how they behave together
At the top is end-to-end tests, where we do a full user journey across several interactions. That’s out of scope for today, but the integration testing techniques we’ll look will translate pretty well to end-to-end tests.
This is an idealized test pyramid...
Here’s what we actually usually see
a handful of unit tests
Zero integration tests
One brittle E2E test that gets ignored
No judgment here -- integration tests are hard.
Microservices make it harder
The whole point of microservice is to isolate things (components/teams) so they can launch and operate independently. As we’ve done that, I’m hearing from more and more people that each team is responsible for their service, but nobody’s really responsible for testing the whole thing. Each team pushes their own service and we hope that it all works in prod.
Hope is not a strategy! But integration testing is really hard. So why bother?
If everyone pushes separately to prod, we rely on semantic versioning. But semantic versioning is a promise. And promises are meant to be broken.
Another way to come at it is the idea of testing in prod. This is a hot topic these days.
It’s a movement that advocates that we should treat our production deployments as uncertain, and do quality testing in prod, and quickly roll back if there are problems.
Should we throw our stuff onto prod and then figure out if it works?
YES. No matter how confident you are in your code, prod is different from not-prod. Things will always break. Treat your production deployments as uncertain, test and iterate. Plan for failures, and even proactively trigger errors.
Testing in prod--if you do it well--isn’t some haphazard YOLO mode of operation. It’s a disciplined framework for responding to errors, which are inevitable, instead of pretending you can prevent all of them.
But it doesn’t dictate that we should stop pre-prod testing
It’s about risk. We need to think about what could go wrong, and what are the consequences when it does.
If you’re running a site of memes, you’ve got high pressure to keep up with evolving culture, and low cost of errors. When a user gets an error, they’ll click retry. No big whoop.
If you’re running an eCommerce site…
If you’re running a hospital back-end…
We might want some additional assurance
We need to make informed choices about our risk tolerance, and then continuously optimize how we do integration testing.
How similar is the test environment to prod? The closer it is, the more diagnostic the results. But also, the more expensive and complicated that infra is likely to be. And it may not be possible to actually match… you can’t run a 200-node Kubernetes cluster on your laptop.
One of my colleagues at Google asked about this just yesterday.
We don’t want tests to interfere with each other, or with prod systems. Each test should have a known environment at the beginning, which remains consistent until the end.
Have you ever been running a test, and then in the middle of the test, everything goes haywire? And you shout across the office: “what’s up with staging?” And someone says “oh, that’s me, I dropped the database so I can test the restore process.”
Generally best not to test against external resources / APIs.
Even if it’s “just reads,” your test might overload the system or throw some nasty malformed request
According to DORA, reducing the time from commit to deploy is correlated with positive devops outcomes, which is correlated with business success. So we need fast CICD so we can get fast deployments. Fast is better than slow.
Beyond speed, we need, well: speed. No matter how fast we can get individual builds to go, pretty soon you’re probably going to need to run more than one at a time. If there are more builds than workers, there’s going to be a build queue. And that makes developers less productive, which makes them sad. And it makes stakeholders sad. So we want to be able to run lots of builds at the same time.
Ideally, test infra lasts exactly as long as you need it.
If it’s not being used, we don’t want to pay for it!
So, it disappears right at the end of the test, yeah?
Not necessarily. If everything works, then we want to recycle the resources. But if there’s a failure, we might want to connect to the test system and poke around.
But how long should things last? Hard to say. Maybe, a few hours?
Because, of course! Nobody wants to spend more money on their testing infrastructure than on their production infrastructure.
There are two factors at play here:: total cost, which is the rational concern, and aligning cost to usage, which is more of an emotional concern...If you can make costs scale in line with usage, then even if you’re paying a lot, you have the assurance that it’s for a reason.
Demo app: microservice application targeting K8s
(brief tour of app code, k8s config)
Each service is independently packaged and can run separately
But each is pretty useless without the other
In our context, we’re going to treat each microservice as a “unit”... in reality you probably have more than one method in each microservice, so unit tests would be a smaller thing and you may have integration tests at a sub-service level.
For today, though, let’s say a unit test is a single component; an integration test is multiple components working together, and end-2-end is the entire application across a full user journey
Running on K8s, to get kubernetes goodies for free. But that’s not essential to this story.
I’m going to show four different ways to test multiple microservices together. They all do the same thing, but they use different techniques for provisioning the test infrastructure.
Here’s the common flow that they do. First...
For the first technique, we’re going to use docker compose. Docker compose is a way to stitch together multiple containerized services. In some ways it’s like a Kubernetes lite. It’s great for developers, because it’s really easy to write the config, and you can run it on your laptop with just Docker. You don’t need a full-on kubernetes environment.
So we can also run our integration tests with it. There are a few different ways to run docker-compose in a CI pipeline. For this example, we’re going to use a self-destructing VM.
Wait, a WHAT?
A self-destructing VM. As part of the test, we spin up a VM to run docker-compose in. Now that creates a risk of orphaned resources. If the test fails, the VM will hang around. Which is great for debugging, but not so great for de billing. So we’re going to program the Compute Engine instance to delete itself after a set period of time.
To do this, we instantiate it with a start up script that will set a timer on the instance. And we set the timeout duration as instance metadata.
When the allotted time has passed, the instance makes a call to the GCE API that says “delete the instance with the name ‘xyz’ where ‘xyz’ is my name.”
And then poof.
We start at a commit...
PROS:
Developer friendly
CONS:
Docker-compose is not kubernetes
Redundant configs (may drift)
Standing staging cluster
We start at a commit...
PROs:
Fast
High fidelity
CONs:Imperfect isolation -- can‘t test cluster-level resources like CRDs; can’t test different versions of k8s
On fail, namespace may be orphaned, but that’s not expensive.
We’re going to use that VM to run k3s, which is an open source server that provides a kubernetes-compatible runtime on a single node.
There’s also minikube, microk8s, and a couple others.
We start at a commit...
PROs:
Perfect isolation
High fidelity -- including cluster-level testing
CONs:
Not exactly a production K8s cluster
Expensive orphaned resources
Why not provision a complete, dedicated K8s cluster for every test?
We certainly can.
Slower, cost overhead.
This is not a live demo.
Comparison matrix of approaches in terms of:
Fidelity
Isolation
Scalability
Ephemerality
Speed
Cost
Which is “best?”
For most cases, namespaces on staging k8s
Comparison matrix of approaches in terms of:
Fidelity
Isolation
Scalability
Ephemerality
Speed
Cost
Which is “best?”
For most cases, namespaces on staging k8s
Lots of services?
Slow to start up and coordinate everything
Find a balance between
Launching all services per test
Relying on standing instances of some services
Use of mocks/stubs/fakes
Semver a/k/a “hope is a strategy” ;-)
Databases (or other non-k8s-service things)?
Run as ephemeral in test
Run as “stateful” (but temporary) within test
Cloud database service!
Call out to external (shared? isolated?) resource
Test Data?
This one’s tough. Best practice = maintain a full and representative test dataset as part of tests
Challenges:
Drift from prod
Mirroring from prod is a nice idea, but PII!
Sanitize / anonymize
Data size (small = not representative; large = takes too long)
Here’s a great resource for learning how to use Jenkins, and of course especially how to use it with Google Cloud.
I particularly want to mention one of the plugins that we publish: the Google Compute Engine plugin. It automatically provisions GCE instances on the fly. So the worker node is created at the start of the build, and then it can be automatically deprovisioned at the end. Or it can hang around a while for debugging. OR, and this is my favorite part, you can configure it so that at the end of a build, the VM worker is recycled, but a snapshot of it is captured before it’s shut off. So you can come back any time and spin that worker back up to troubleshoot. The cost of storing those images is really cheap, so I think this is the best of both worlds.
With that, I want to say thank you. I’d love to hear your thoughts on this, so come chat with me, or tweet at me. Tell me what works for you. Thanks!