Introducing Continuous Delivery practices to a team in trouble can be daunting. Where do you start ? What do you do first ? Which battle do you pick first ?
I’ll share my experience of guiding a team to achieve a higher degree of delivery maturity. This is a journey from a troublesome, struggling start of chaotic manual deployments, merge hell, regular production roll backs and lost code, to deliver a single commit to trunk automatically and reliably, under an hour, many times a day.
14. Think about how long would it take you to deliver a
change of one line of code within your application to
production
~Mary and Tom Poppendieck
Lean Software Development, An Agile Toolkit
15. “Continuous Delivery is a software development
discipline where you build software in such a way that
the software can be released to production at any
time.”
http://martinfowler.com/bliki/ContinuousDelivery.html
17. “Continuous integration (CI) is the practice, in
software engineering, of merging all developer
working copies to a shared mainline several times a
day.”
37. “A longtime dream come true! It is exciting to be able
to educate patients while they wait in the exam room.
This is the future”
~Comment received from a Dr
How many of you are already doing CD?
Started doing CD
Want to do CD
Bob gets a big idea about his product, like any other entrepreneur
And the day for the integration comes and it this is what happened...
The release is delayed, something they never expected.
Demotivated Team
Unhappy Customers
Furious Business team
Familiar with the above story? Has it ever happened to you?
Lets analyse what went wrong for Team X. They thought everything is going fine, but later proven that to be wrong.
Also known as Trunk based development, is where everyone in the team commits to a single branch. Continuous Integration on the mainline branch guarantees that the branch is ready for deployment at any given point of time.
If all the team members are committing to the trunk, what about features which are incomplete and or which requires feedback from key stakeholders before opening to end users?
Thats when Feature toggle comes into picture
A Feature Toggle - also known as a Feature Flag, Feature Flip or Feature Switch - is a simple technique where you can turn on or off a certain feature through configuration.
By keeping the toggle off in the production environments, the worry of end users seeing the incomplete feature can be avoided.
This is a sample ruby code for the same.
Yes, I know what you are thinking. So many “if...else blocks in the codebase”. Yes, it’s true. It can get complicated. But it’s only for a short period of time. Once the feature is done, the toggle can and should be removed completely.
You can use configuration files, yml or property files or any other configuration thats the std you follow or the one your framework provides
As discussed, the common usage is to hide features under development, which is called as Release Toggles.
This helps us to segregate deploy from releases.
This also helps us to quickly rollback features when there are surprises in production due to a recent deploy. The usual practice is to rollback the code, which can be as cumbersome as Team X’s merge. With toggles this can be made easier by turning off the feature through configuration and deploying again.
Apart from Release toggles, you can also use toggles for quickly testing features using the techniques such as A/B Testing. Most of you might be aware about A/B Testing, a technique used to test multiple versions of certain pages or feature. These are quick experiments done with a group of actual users, to see which version has a better usage over the other.
Another technique is called Canary releasing where the feature is released to users in a gradual manner than releasing to the entire userbase. This way if there are surprises with the feature, the same can be caught and fixed early without the same affecting the entire userbase.
A good system should be designed for failures. A failure can due to anything that comes under the “unexpected” category, such as unexpected load on the system or unexpected failure such as network or hardware failure
For eg: disabling an expensive recommendation engine is fine compared to that feature taking up the resources during the crunch times. You can either disable the feature completely or have a degraded version eg: showing a cached version of the data, which might be outdated.
And thats when Ops toggles comes into picture. You can create toggles for such features, which can be removed at a later stage once you get some confidence.
You can also think of them as manually managed Circuit breakers too to avoid halting the entire system
What about architectural changes?
Every approach has pros and cons. Feature toggle is not different from the same.
Feature Toggles introduces if..else, which can add to the Code Complexity. Ideally, the toggles should exist for a short duration until the experiment is done or until the feature reaches to all the users. But ideal, at times, may be far from reality, which can create issues as these.
This is what happens when you’ve many toggles in the system. You can imagine the permutation and combination it can create
There is no “silver bullet” that really works for every team because the context is different for each team. This is similar to “how to pay off technical debt? Or how do you create less bugs”
In my experience it takes some time to create that discipline of removing the toggle. And thats the same with any new practice say for eg: TDD.
One way is to keep an “expiry date”, i.e. bring in a structure to remove toggle which passes a certain duration.
I’ve read that some teams have actually automate this, i.e. add scripts to check the expiry date for the toggles and fail the build if crosses a certain time span. I’ve not tried that myself, but looks very interesting.
Github workflow is a model that is commonly used by many. It is a good model for Open Source Projects to bring in some amount of rigor to the bazaar. But in a more controlled environment, relying on interdependence within the team might better than a rigorous code review process
Lets look at a changed Team X
This is about in the microfinance space, where loans of small amounts are given to people who wants to start business but doesn’t have much credibility. This is in New York
We all visit clinics for various reasons, for checkups for our parents close relatives, specific visits for our kids or for ourselves. What we usually do is wait until we meet the dr. in the reception area and visit the doctor, summarise about our problem, hear the recommendations and thoughts take the prescription and go home.
Ann visiting St: Mary diabetic clinic as she is a diabetic person. While waiting for the appointment, she finds the tablet that is placed in the room, which talks about how other people, similar to Ann, has overcome diabetics. Out of curiosity, she takes the tablet watches the videos again and takes notes and later realises that she can share the same to her email. She makes some questions that she wanted to discuss with the dr, about what she saw in the video.
When her turn comes for the appointment, than being a nervous patient, Ann is more optimistic and looking forward to clear her questions with the dr. The clinic started seeing positive impact on the patients through these videos and doctors confirmed the discussions with the patients also significantly improved with it as these videos give them a clear perspective about the specific disease or problem, which is very effective and helps the doctor to add more on top during his conversation with the patient.
Once the tablets became useful, they extended it to the reception area too, through the big display TVs
The company approached us to build the content delivery app, and later they wanted to build custom remote management functionality so that these devices can be controlled from a centralised location.
Julio is a baker who wants to start his own bakery, but doesn’t have much savings and the banks doesn’t give because of the same. He will not able to afford the normal bank’s interest rate too
That is when someone tell him about X Microfinance institute and Andy approaches him for mentioning that provide us some collaterals and we will give you loan.
Julio provides the collaterals details and the underwriting team recommends the loan amount
This institute serves many Julios to help them build a business
They had a some automation and some manual process for the entire origination of the loan to the final closing of the loan. The founder of this institute approaches us to build the entire end to end system as he was not able to scale it with half automated system
We sliced the same into smaller systems which exposes APIs whenever it was required for system to system interaction.
Started with the loan processing system:
Julio submitting the loan with collaterals
Andy Review and approval of the same
Julio Sign the documents [esign]
The loans are not restricted to just individuals, companies can apply too, but the collaterals and approval process would vary and also there will be multiple people signing the loan
Tapabrata Pal, Director, Engineering Fellow, Capital One speaks about his experience with moving to Mainline development and the impact of the same @ Capital One.
Capital One is one of the ten largest banks in the US, and is known for its innovative approach to customized services and offerings.
Capital One has been on the CD adoption by introducing automation and pipeline for the last few years. Recently they started measuring and found that the branching strategy is one what slows them down.
They were able to go from 40 deploys per day to over 800 deploys per day with zero increase in incidents -- all in just two months
This is the rejuvenated Team X, a team with better predictability and resilience as they are working on more creative tasks