apidays LIVE New York 2021 - API-driven Regulations for Finance, Insurance, and Healthcare
July 28 & 29, 2021
APIOps: automating API operations for speed and quality at scale
Melissa van der Hecht, Field CTO at Kong
Automating Google Workspace (GWS) & more with Apps Script
apidays LIVE New York 2021 - APIOps: automating API operations for speed and quality at scale by Melissa van der Hecht, Kong
1. THE CLOUD
CONNECTIVITY COMPANY
THE CLOUD
CONNECTIVITY COMPANY
APIOps
Automating API operations
for speed and quality at
scale
Melissa van der Hecht
Field CTO, Kong
THE CLOUD CONNECTIVITY
COMPANY
12. THE CLOUD
CONNECTIVITY COMPANY 12
Oh no! I approved this API,
but I forgot to check the
security...
API Platform Team
I’m sure it’s fine. I
haven’t got time to
check everything.
23. THE CLOUD
CONNECTIVITY COMPANY 23
Operate
Publish
Design
Build
Deploy
Asset inventory in
version control
1. API spec
2. API tests
API
tests
API
spec
24. THE CLOUD
CONNECTIVITY COMPANY 24
Operate
Publish
Design
Build
Deploy
Asset inventory in
version control
1. API spec
2. API tests
API
tests
API
spec
Automated
governance
25. THE CLOUD
CONNECTIVITY COMPANY 25
Operate
Publish
Design
Build
Deploy
Asset inventory in
version control
1. API spec
2. API tests
API
tests
API
spec
Automated
governance
26. THE CLOUD
CONNECTIVITY COMPANY 26
Operate
Publish
Design
Build
Deploy
Asset inventory in
version control
1. API spec
2. API tests
API
tests
API
spec
Automated
governance
Outcome
1. Validated API spec
27. THE CLOUD
CONNECTIVITY COMPANY 27
Operate
Publish
Design
Build
Deploy
API
tests
API
spec
Outcome
1. Validated API spec
Asset inventory in
version control
1. API spec
2. API tests
3. API implementation
28. THE CLOUD
CONNECTIVITY COMPANY 28
Operate
Publish
Design
Build
Deploy
API
tests
API
spec
Outcome
1. Validated API spec
Asset inventory in
version control
1. API spec
2. API tests
3. API implementation
Automated
governance
29. THE CLOUD
CONNECTIVITY COMPANY 29
Operate
Publish
Design
Build
Deploy
API
tests
API
spec
Outcome
1. Validated API spec
Asset inventory in
version control
1. API spec
2. API tests
3. API implementation
Automated
governance
30. THE CLOUD
CONNECTIVITY COMPANY 30
Operate
Publish
Design
Build
Deploy
API
tests
API
spec
Asset inventory in
version control
1. API spec
2. API tests
3. API implementation
Automated
governance
Outcome
1. Validated API spec
2. Validated API
implementation
31. THE CLOUD
CONNECTIVITY COMPANY 31
Operate
Publish
Design
Build
Deploy
API
tests
API
spec
Outcome
1. Validated API spec
2. Validated API
implementation
Asset inventory in
version control
1. API spec
2. API tests
3. API implementation
4. Declarative config
(generated)
Declarative
config
33. THE CLOUD
CONNECTIVITY COMPANY 33
Operate
Publish
Design
Build
Deploy
API
tests
API
spec
Outcome
1. Validated API spec
2. Validated API
implementation
Asset inventory in
version control
1. API spec
2. API tests
3. API implementation
4. Declarative config
(generated)
Declarative
config
34. THE CLOUD
CONNECTIVITY COMPANY 34
Operate
Publish
Design
Build
Deploy
API
tests
API
spec
Asset inventory in
version control
1. API spec
2. API tests
3. API implementation
4. Declarative config
(generated)
Declarative
config
Outcome
1. Validated API spec
2. Validated API
implementation
3. Registered endpoints
(automated)
4. Governed endpoints
(automated)
35. THE CLOUD
CONNECTIVITY COMPANY 35
Operate
Publish
Design
Build
Deploy
API
tests
API
spec
Asset inventory in
version control
1. API spec
2. API tests
3. API implementation
4. Declarative config
(generated)
Declarative
config
Outcome
1. Validated API spec
2. Validated API
implementation
3. Registered endpoints
(automated)
4. Governed endpoints
(automated)
Automated governance
36. THE CLOUD
CONNECTIVITY COMPANY 36
Operate
Publish
Design
Build
Deploy
API
tests
API
spec
Asset inventory in
version control
1. API spec
2. API tests
3. API implementation
4. Declarative config
(generated)
Declarative
config
Outcome
1. Validated API spec
2. Validated API
implementation
3. Registered endpoints
(automated)
4. Governed endpoints
(automated)
5. Discoverable,
documented endpoints
(automated)
37. THE CLOUD
CONNECTIVITY COMPANY 37
Operate
Publish
Design
Build
Deploy
API
tests
API
spec
Asset inventory in
version control
1. API spec
2. API tests
3. API implementation
4. Declarative config
(generated)
Declarative
config
Outcome
1. Validated API spec
2. Validated API
implementation
3. Registered endpoints
(automated)
4. Governed endpoints
(automated)
5. Discoverable,
documented endpoints
(automated)
6. Ongoing operations
(automated)
43. THE CLOUD
CONNECTIVITY COMPANY 43
API Platform Team
It’s incredible how
much time we’ve saved
automating these
reviews.
generate
declarative
config
run tests
lint spec
47. THE CLOUD
CONNECTIVITY COMPANY
Operations Team
47
STATUS REPORT
Uptime: 100%
Deployment speed: Instant
Status: ok
Deployment success
Plugins applied
Status: ok
Deployment success
Plugins applied
Status: ok
Deployment success
Plugins applied
Melissa, super excited to be here! I’m Field CTO at Kong - 2 years, and MuleSoft for ~5 years before that – so I’ve been very much part of defining what an API-first approach means, and how to adopt it.
And there’s a trend I’ve been seeing recently – the approaches we took in the past to become API-driven aren’t scaling any more as our landscapes and requirements have changed - most companies I speak with now are actually having to trade off between the delivery speed and quality of their APIs.
And this session is all about that. We’re going to look at how we can automate the API lifecycle with APIOps, to give you both at scale.
Let me start off by welcoming you to Acme, a large bank with a sprawling tech landscape. The company’s been around for several decades, so they’ve got a lot of legacy systems and tools, and multiple siloed engineering teams.
As part of Acme’s digital transformation they’re migrating most of their workloads to the cloud and Kubernetes, and adopting more of a consistent, API-driven approach.
(In fact, the Mortgages team has just identified the next API they need to build.)
In fact, the Mortgages team has just identified the next API they need to build.
In fact, the Mortgages team has just identified the next API they need to build.
Emily’s finished designing it, and she’s reviewing the spec with her team. They all agree it looks great, (so as per their normal process, she sends it off to the API Platform team for review.)
so as per their normal process, she sends it off to the API Platform team for review, and moves on to her next task.
The API Platform team owns Acme’s API Platform, as well as the overall architecture. They host and manage the platform on behalf of the rest of Acme, with the goal of raising the overall Engineering standards across the organisation.
A group of them meet once a week to go through all the new APIs that have been submitted, and check them for standards.
Sadly, in this case, Emily’s spec is not approved.
It turns out there’s a whole set of standards that Emily just doesn’t know about. It’s probably documented somewhere, but it’s not very well communicated and it’s definitely not done in a developer-friendly way.
So a week after she submits it for review, the Platform team reject Emily’s spec, and it gets pushed back down to her.
This is pretty embarrassing for Emily. She’s getting called out in front of her peers for not doing a good enough job.
This is also a huge waste of everyone’s time. Emily’s going to have to redo her work, and the Platform team are doing these reviews manually, at scheduled intervals, so there’s several days wasted even just waiting for that review.
(And it’s not just Emily and the Mortgages team that suffers here.)
And it’s not just Emily and the Mortgages team that suffers here.
Acme’s following best practice and using a single API Platform for global discovery and re-use across the business.
(Which means, as adoption grows, the Platform team needs to support more and more teams across the organisation…)
(Acme’s following best practice and using a single API Platform for global discovery and re-use across the business.)
Which means, as adoption grows, the Platform team needs to support more and more teams across the organisation, and then get [CLICK] more and more APIs coming in for review.
(Which means, as adoption grows, the Platform team needs to support more and more teams across the organisation, and then get) more and more APIs coming in for review.
On top of all the other work they have to do…
The Platform team ends up being stretched very thin, so rather than spending enough time fully reviewing every API, they end up having to prioritise.
Compliance becomes a nightmare, and things start to fall through the cracks...
...which isn’t so good [CLICK] for the Operations team…
(...which isn’t so good) for the Operations team, who are responsible for maintaining the overall IT estate.
Enough has fallen through the cracks that there’s a lot of errors in production. Nothing’s guaranteed to be consistent, and deployments are pretty painful - in fact they refuse to deploy new code more than once a week because it causes so much instability.
Elsewhere in Acme, the [CLICK] Mobile team operates a little differently.
(Elsewhere in Acme, the) Mobile team operates a little differently.
Their goal is building rich, digital experiences for Acme’s customers, as a reaction to the mobile-only banks that were threatening to displace them. They’ve been given a lot of freedom so that they can get these applications out as quickly as possible.
They’re about to release their latest Open Banking app, and this one’s a big deal for Acme because it’s the first time they’re exposing actual API endpoints to customers.
Having seen the delays getting APIs live elsewhere in Acme, the Mobile team’s decided to do things their own way and bypass the API Platform team altogether.
But they were in such a hurry to go live on time that they just focused on the implementation code and missed some API best practises.
And this means their APIs are inconsistent. They’re hard to find, hard to access, and hard to use - which puts people off, whether that’s internal or external consumers. Their prospects are much more likely to go to one of their FinTech competitors who know how to treat APIs as products. Because this is what makes an API consumable.
Making matters worse is the fact someone in the Mobile team forgot to secure one of his APIs when he published it. This then got exploited and Acme detected a data breach affecting 15 million customer accounts.
Acme started off with all the right intentions… but they’ve ended up trading off between speed and quality - and this is what I’m hearing time and time again is the biggest pain point in API adoption.
In fact I’ve surveyed 100s of people over the last few months and consistently, 80% of audiences are making this trade off right now. I’m interested to know actually – stick it in the chat – are you having to trade off right now, and is it to prioritise speed or prioritise quality?
This problem is where APIOps comes in.
APIOps is the automation of the full API lifecycle. It combines DevOps philosophies, when it comes to iterative design and continuous testing, with GitOps philosophies in terms of automated, declarative deployments.
Where before, we saw manual, costly, and error-prone activities at Acme, we now automate all of it.
Let’s see what that actually means.
We know the API lifecycle, this is nothing new.
Best practice means that we design an API before we build it, and then once it’s deployed we add governance and operational policies to manage it, before making it discoverable to consumers in a Portal.
Then there are all the ongoing operations, and this lifecycle continues going round until we retire the API
This is no different with APIOps. We’re still following best practice, but what you’ll see is that the processes we follow at each step, and BETWEEN each step, have changed.
So, at design time, we use a design environment, like Kong’s open source Insomnia, to easily create the API spec - which is typically a Swagger or OAS document.
We also create a test suite for that spec. Here we should check several things – like are we getting the responses we expect in certain conditions
What’s critical here is that the tooling we use gives us instant validation. That’s linting of the spec against best practices, the ability to run those tests locally and validate what you’re building. As the designer of the API, you need to have self-serve tooling that makes it easy to do the right thing from the beginning - you don't want to end up like Emily
When you’ve created the spec and validated it locally, you then push it into Git - or whichever version control system you use - and you raise a Pull Request for this new API.
This triggers a governance checkpoint embedded in our pipeline
Before any time is spent building the API, we need to be sure that what’s going to be built follows our company standards, and is aligned with everything else in the ecosystem.
We automatically invoke the API tests that we built earlier, and any other governance checks we want to include at this stage of the pipeline - for example, are we paginating consistently across many APIs
Like before, there are going to be checks that the Platform Owners will want to do for every API, that Emily, and the other API Designers, won’t have awareness of.
But unlike before, this is not a manual review. This is now an automatic, and therefore instant process. In Kong, we enable this through the open source command line tool Inso
If the spec fails any of those tests, it gets automatically pushed back for more work in the design phase. Emily doesn’t have to sit around waiting for a response from the Platform team, she just gets an instant, automated notification that something needs to change
And because this is an automated check, embedded in the pipeline and triggered by default when a spec is pushed into Git, it means there’s 100% coverage of these checks for every API that’s being designed, anywhere in Acme.
So we’re now consistently catching errors as close to the beginning of the pipeline as possible. Which means they’re much faster and cheaper to remediate, in fact it’s estimated that to find and fix a bug now costs 1% of what it would in production
When all the tests pass, then we have a validated spec and can now progress onto the build phase.
Here we build our API in the normal, best practice way - we use the spec as the contract to tell us what the API needs to do and what the interface needs to look like, and we use the tests as we go to validate that the API we’re building meets the spec
As before, when the developer commits their code saying it’s ready for deployment, a series of tests are triggered
We automatically execute the tests that we built at design time again, to make sure the API still meets our best practice. These tests are actually our unit tests and will also make sure that the implementation of our API functions how it should. There may well be additional tests that we also want to carry out at this stage, still automatically.
If any of the tests fail, we know immediately. We do not deploy the API, we go back and make the necessary changes until our implementation is how we need it
And we can keep executing these tests for continuous validation of what we’re doing
When those tests pass, we progress forward to deployment
Now this is where we start to see more of a GitOps approach
Because when this round of automated tests has been passed, [CLICK] we then automatically generate the declarative configuration file for this API
Because when this round of automated tests has been passed, we then automatically generate the declarative configuration file for this API
And this is one of the central components of GitOps. It’s all about declarative, rather than imperative, ways of managing deployments. This is the modern way of managing infrastructure because it has so many benefits in terms of deployment speed, auditability, and repeatability - benefits that we need when we consider the level of scale and complexity that we have to manage now compared to a few years ago.
For those that aren’t familiar with a declarative approach, it’s a lot more streamlined than the traditional imperative approach to CI/CD
If we’re doing things declaratively, we just specify what we want the end result of something to be, whereas with imperative we also have to specify HOW to get that end result
In the context of deployments, if we’re implementing CI/CD the imperative way, we write a script that orchestrates every step that needs to happen. Call this Admin API, extract that value, use it to call this API, add the policies with that API… and so on. This is a pain to first set up, a pain to debug if something goes wrong, and a pain to rewrite if and when one of the underlying Admin APIs changes.
But if we’re doing it the declarative way, we don’t need to worry about any of that. We just tell the Platform what it needs to look like when that API’s been deployed, and the platform itself takes care of how that’s achieved.
This simplicity is why every modern deployment approach is now declarative.
And the same is true in APIOps
The beauty here is that we shouldn’t even need to write that declarative config file ourselves – in tools like Kong we can automatically generate it from the API spec. So we can have it instantly.
And because it’s generated from the spec, it’ll be completely accurate and consistent with the spec, so we know that nothing will be forgotten about, and there’s no chance of human error in that deployment process.
So - that declarative configuration, having been automatically generated as part of the pipeline, instructs the API Platform what it needs to look like once the API's been deployed, and the platform goes off and configures itself. So we end up with our API registered in the platform, and with the various security, governance, and operational plugins for that API configured as well.
It’s also worth noting that we store this declarative config file in version control, along with the spec, tests, and the implementation of that API. This means we have a complete, searchable and auditable history of every deployment we’ve made. So if ever there’s a problem once we’ve deployed the API, then we can very easily roll back to a previous state - so it’s not just that we’ve made deployment easier, but roll backs as well.
Of course, once we’ve deployed the API, we need to validate it performs how we expect and check that we haven’t caused any errors. We’re now in an environment where other APIs and code‘s deployed, so we should do some integration testing, security testing, performance testing… whatever's appropriate depending on where you are in your software development lifecycle.
So we’ll run that series of release checks before we actually publish this API and make it discoverable. These checks should also all be automated, although you may want a final sign-off as a manual step before you push that Publish button
When you’re ready to Publish that API, then registering it in the Portal, enabling self-serve access, and adding the spec for that API, should be an automated process as well. After all, the only way to ensure every API is discoverable and documented in the Portal, is to automate that publishing.
Now, what we’ve built up as we’ve gone through the API lifecycle, is an inventory of assets that enable us to operate this API on an ongoing basis in an almost entirely self-sufficient way.
If we need to scale out the API to handle higher throughput, that can be completely automated using the declarative configuration. Since this is version controlled, we’ll see a completely repeatable, identical deployment to before.
The overall result here when our API lifecycles follow APIOps is that the continuous automated testing and deployment means we catch and resolve errors and deviations from our standards early, speeding up deployment and raising quality and consistency.
Acme’s just adopted APIOps.
(In the Mortgages team, Emily’s working on another API.)
In the Mortgages team, Emily’s working on another API.
As before, she’s following best practice by doing this design first.
But unlike before, the tool she’s using to create her design, gives her instant feedback on it, so she can make sure, herself, that the spec she’s building [CLICK] doesn’t violate any policies.
(she can make sure, herself, that the spec she’s building) doesn’t violate any policies.
She’s skipped out several days of back and forth with the API Platform team getting this right, instead it just takes her a few minutes herself.
And once her spec meets standards, she then has the ability to push it directly into Git, so that it triggers the next part of the automated APIOps pipeline.
This creates a Pull Request in Git, for the API Platform team to then review and decide whether to approve and merge it in to the code base, or to reject it and send it back to Emily for more work.
Life is very different now in the API Platform team. They’ve automated the API review process, and applied it to every single API coming in for review - so they’ve got 100% coverage of every quality, security, and compliance check, across every single API being built at Acme.
Their QA costs have gone way down, and they’re no longer the bottleneck for APIs being deployed. If there was a problem in Emily’s spec she wouldn’t have to wait for the next scheduled review session - because these automated reviews are triggered whenever a new API’s been submitted.
But this time there weren’t any issues with Emily’s spec - the tool she used at design time made it easy for her to do the right thing from the beginning, so the chances of her API meeting standards now is much higher.
Once the automated tests have passed, the last step is the automatic generation of the declarative config file from the spec.
And this is then added to Git and [CLICK] picked up by the Operations team.
(And this is then added to Git and) picked up by the Operations team.
Not that they really have to do much.
The API Platform configures itself based on the declarative config. It registers the endpoints, and applies and configures all the necessary plugins as well - so no more forgetting the security. The Platform also automatically makes the endpoint discoverable in Acme’s Portal.
So Emily’s API is deployed immediately, and smoothly.
Deployment is much more likely to go smoothly with APIOps because a) everything has been tested, so the chance of introducing problems is lower, and b) the deployment is completely automated and declarative.
In fact, deployments are now so repeatable that the Operations team have removed their limit of one a week. They’re now deploying in a truely continuous fashion, and can meet the increasing demands as API adoption accelerates across Acme.
Of course, there will still be times [CLICK] when things go wrong - that’s unavoidable.
Of course, there will still be times when things go wrong - that’s unavoidable.
But the impact of something going wrong is now much easier to minimise. Since every version of each declarative configuration is in version control, we have a complete history of every deployment. And since these files are all declarative, it’s very easy to just revert back to a previous state - the Operations team just needs to feed in one of the previous configurations to the Platform and it’ll revert itself back.
Things are quite different for API consumers now too.
Through APIOps we’re ensuring every API is consistently discoverable, secured, documented, and reliable.
This means Acme’s portal is now a thing of beauty: it’s a catalogue of products, where each product is a well-designed API.
All of this means that Acme can now operate at pace, without lowering delivery quality. In fact, they’re increasing quality whilst reducing costs - which means they’ve got much more resource to innovate than they could before, so they’re constantly delivering new capabilities and experiences to their customers.
This is the power of APIOps, and it’s not just open to companies like Acme. Every single organisation can automate the API lifecycle like this, if you have the right API tooling and the right API-first mindset.
Thank you very much, if you’ve got any Qs you can find me in the chat!