What does it look like for a team to adopt Docker and GitLab CI/CD when they haven’t been using them previously?
With our first CI/CD run we decided to fully commit ourselves to the GitOps philosophy and during the upcoming years, we hit a lot of obstacles. But, these experiences welded us together even more. By using GitLab we invented a lot of new tools but the most important thing was that we have built it up together, bottom up, with us as individuals.
Follow my talk and see how we started our adventure through the endless space of the GitLab galaxy and listen how we are using our experience, our knowledge and our team friendship to build up creative innovative solutions today. You will see that together with human friendship you can go where no one has gone before.
4. #GitLabCommit#GitLabCommit
T-3 years (2014)
● GitLab since 2014
● GitLab for developers “Git” only
● OpenVZ environment
● Semi automatic deployments
To Go Where No One Has Gone Before...
Source: https://images.nasa.gov/details-6754387
7. #GitLabCommit#GitLabCommit
T-1 year (2016)
● Start using Docker
● Manual builds
● Containers are nice… but… hard
to tackle if you do it manually
First transition
Source: https://images.nasa.gov/details-MAF_19671005_S1C_ViewtoWest
9. #GitLabCommit#GitLabCommit
T+1 second
● First pipeline run - canceled
● Next runs - canceled!
● Run #9 - Successfully failed!
● Run #14 - First success!
Hmmmm… failures
Source: https://www.nasa.gov/images/content/618280main_LAS_apolloaborttest.jpg
11. #GitLabCommit#GitLabCommit
T+3 months
● More pipelines
● More jobs
● More projects
● More of everything
Faster, Further, Higher
Source: Apollo 2, Apollo 3, Apollo 5, Apollo 6, Apollo 7 launch taken from Wikipedia NASA public domain
15. #GitLabCommit#GitLabCommit
T>4 months
● Recreate everything
● More backends
● More Docker Swarm (true!)
● More GitLab projects
● More Load Balancers
● GitLab includes (10.2018)
Reboot from scratch
Source: https://www.flickr.com/photos/spacex/40126460511/
16. #GitLabCommit#GitLabCommit
T>2 years
● GitOps!
● For us the only way
● Hundreds of projects
● Different knowledge
● Different software stacks
GitOps
Source: https://www.youtube.com/watch?v=bvim4rsNHkQ
17. #GitLabCommit#GitLabCommit
T>2 years
● Kanban board
● Talk with each other
● Listen to and learn from other
people
Structure work
Source: https://www.youtube.com/watch?v=bvim4rsNHkQSource: self made
Hello and welcome to my talk “To Go Where No One Has Gone Before”.
NEXT SLIDE!
My name is Mario and my team and I have been using containers for more than ten years and before we started to use GitLab, CI/CD pipelines and GitOps operation in general, life was much harder for us!
Like many we worked on weekends and on evenings to get software up and running and into production. Inside the IT department of the STRABAG construction company, there are more than 60 developers who are working on more than 110 internal web-based software projects nowadays. My team and I are the responsibles not only for the infrastructure of these web-applications moreover we are also responsible for a lot of other services which are needed to power this infrastructure, like DNS, Puppet, Zabbix and of course GitLab.
You might ask, why is this important for you to know? Well, my team consists of only four people including myself.
NEXT SLIDE!
So, this is a story about a transition. More than 14 years ago my colleagues and I had only one web-application to host, now, 14 years later, we have more than 3800 containers, more than 110 web-application projects, more than 600 load balancers to manage which includes automatic LetsEncrypt Certificate updates and much more other services. Without multiple transitions, and without a deep sense of automation, this would be impossible to handle for just 4 persons. Still asking about the why? You just don't want to work with a system that can fall apart at any time. Instead, you want a reliable system where you know what is going on.
In the upcoming slides, I will show you, what life was like back then and what caused us to go the “To Go Where No One Has Gone Before” way with GitLab and why we are going this way.
So we’ll follow the story using the analogy of a rocket launch, from the preparation to the start over the reinvention to the point in time where we are now and what lessons we’ve learned along the way.
NEXT SLIDE!
Say some words about the picture: Saturn V first stage await assembly at NASA’s Michoud Assembly Facility in October of 1967
GitLab since 2014
Oldest existing project (ID 2) dates back to 2014-10-09 09:56:59
GitLab for developers “Git” only
Before the time of Git, we were using a central SVN installation but, this was a pain because...
... we had to write our own management for it to assign permissions to repositories and so on
OpenVZ environment
Back then, as said in the introduction, we were already using containers and in 2014, we had about 400 of them based on OpenVZ
OpenVZ was great, but there were not batteries included, this means, that we have to do all by ourselves
For example, we had to manually implement a process to deliver the software onto our servers and this was a very complex setup which was hard to handle
Semi automatic deployments
Basically it was self made CI/CD process which consists of the following steps just to give you a clue about the complexity:
Dev War Package -> FTP Server -> picked up by a central server which took package the WAR Packages (Debian, Python, C) and created an operating system compatible package for example an Debian Apt package -> Create AT Jobs -> Hopefully everything gets installed
This tool chain was only partially automated and we had media break everywhere (FTP to Server, Packaging, Delivery, Execution)
This is why handling only 400 containers and about 20 web-application projects this way was not future-proof
To explain it a little bit more, I made a simple comic to show what the CI/CD process was back then.
NEXT SLIDE!
To sum it up: There was always a difference between what the developers thought to be installed onto the server and what finally was installed onto it
This misunderstanding was one of the main reasons why we had to handle several outages back then
Therefore we started the first transition to change the system and to be honest, with the growth of the system we did this several times. But let’s start with the first one!
NEXT SLIDE!
Say some words about the picture: Saturn V first stage await assembly at NASA’s Michoud Assembly Facility in October of 1967
Start using Docker
In the end of 2016 and the beginning of 2017 we start to use Docker instead of OpenVZ
Because the network problems regarding container networking was “solved” with the overlay network feature introduced by Docker Swarm
The overlay network solved multiple problems for us because running Kubernetes on-premises in 2017 wasn’t really an option
The main reason for not to go with Kubernetes was, that Kubernetes CNI plugins were designed to be used by huge setups and/or need a lot of additional work or had to much moving parts
So why we used and why we are still using Docker Swarm on-premise? Because it is super simple to setup and with the build in overlay network there is a secure and easy to use network CNI provider
But today we are using Kubernetes too
Manual builds
But even we did a step forward with the use of Docker, we still had manual builds
Therefore we did a lot of prototyping about what could be a proper way to eliminate this manual steps
But we did not find a good practice on how to proceed back then
Containers are nice… but… hard to tackle if you do it manual
What we've learned was, that we can do it with Docker to enable a better Developer and Operator experience by enabling faster development and deployments cycles, but this would be only doable if we can eliminate the manual steps
NEXT SLIDE!
Say some words about the picture: Saturn V first stage await assembly at NASA’s Michoud Assembly Facility in October of 1967
TOGETHER, Dev and Ops!
Together, for the first time, we started our first Dev and Ops meeting.
The reason for this meeting was, that the developers were in the need to use some kind of automation for their build process
In this meeting, the first idea was to use Jenkins, but thankfully, GitLab already enabled GitLab CI/CD with GitLab pipelines
So we decide to go with the GitLab Runners and to use GitLab to automate the build processes for the developers. But in return together we also agreed that we will implement the CD process too
So no CI without CD
That was the main and the most important breakthrough because now we started to work together to make the transition real
AND THEN...
2017-04-28 08:04:48 (GMT)
Hip Hip Hooray
On exactly this time and date we ran our first pipeline and started our journey!
This was a really exciting moment for us
BUT … NEXT SLIDE!
Say some words about the picture: Apollo abort test
First pipeline run - canceled
The first pipeline was canceled because we made some configuration mistake and therefore the pipeline never finished. This was due to a misconfiguration which would have lead to an infinite pipeline
Next runs - canceled!
Funnily enough the same situation happens another 8 times :)
Run #9 - Successfully failed! YES!
This was a step forward because we learned how to use the system even if it was failing.
Run #14 - First success!
But Hoooray Run#14 was our first successful pipeline run!
These first learnings showed us that we also have to agree on a new “Failure Culture”. Failures will happen, either in operations or development but we agreed to not blame each other, instead we will work to find the root cause together.
NEXT SLIDE!
What you see here is the overview about the first pipeline runs we ever did - three years ago!
The first ones were canceled
Then we had some failed one
And finally we managed it to run it successfully
So what happens after these first baby steps into GitLab CI/CD?
NEXT SLIDE!
Say some words about the picture: Apollo launches 2,3,4,5,6
Well, every successful system will grow by itself like the picture on the right side shows for the Apollo missions
More pipelines
More jobs
More projects
More of everything
And to show you what happened within the first 3 month after the initial start, I will show you some numbers now.
NEXT SLIDE!
From the first pipeline and job runs in 2017 we rose to 1400 pipeline and 4000 job runs only 3 month later!
From our point of view, this growth was caused by the fact that for example a sub-team used the new processes, and therefore the GitLab pipelines, and another sub-team saw that teams who are using this pipelines saved a huge amount of time - which in turn leads to a domino-effect.
BUT… if you reach the end of the first transition after some months, something might go wrong ….
SO … NEXT SLIDE!
Say some words about the picture: Apollo 13 Kennedy Space Center, shot during abort
This images says it all; flight engineers, astronauts, and many more
Houston we’ve a problem!
… so it’s a picture that clearly shows what we had to do at this point
We had to change change our work culture
because we still had manual processes inside our GitLab projects, for example, we are using multiple Git repositories, one for the source-code and one for the deployment/configuration of the deployments
These repositories are hard to setup at scale if you do it manually
Especially if you use GitLab-CE (like we do) you have to invent some glue code at this point to handover the pipeline run between these two repositories, the development and the deployment repositories for example
Finally exactly this glue code was then implemented and established by our devs - Thanks!
What does this mean?
Going meta!
Welcome to the GitLab metagame!
Going meta means, that we had to simplify our multi-project multi-stage Git project setups
This multi repository was too complex to be handled by people who are not able to deep dive into the nitty-gritty details
For example, we had to setup various GitLab CI/CD variables manually to use the power of the automated build and deploy capabilities over multiple projects
Furthermore pipeline variables can be a subject of changed over time, for example, there can be the need to add new variables into the process
Also functions of GitLab stages can be extended or change over time
Our solution to solve this was, to create a GitLab project which pipelines are used to create GitLab projects, source projects and deployment projects for a product, which are already setup with the pipelines and variables. Therefore the developer can just use the build and deploy pipeline within this automated created Git Projects without the need to now every small detail - huge step forward
Care about the Doers!
This means, that every member of a team is different
you will have some people who are happy to be on the edge of development
And other developer who are doing the really important job of just keep the system stable and reliable - the Doers - we need to take care about this situation and automation can be helpful to keep boths groups, edge people and Doers close together
Here is a little comic about it … NEXT SLIDE!
In the beginning of a transition, the Doers and the Edge people are close to each other
Think about a rubber-band around these to groups of people, the rubber band is elastic and could be stretched until a certain point
As time moves on the Doers cannot follow the pace of the Edge people, because the Doers take care that every new cool and hot stuff gets in shape to be used by everyone in a reliable way
Therefore the rubber-band will be stretched more and more
If the Edge people are proceeding to fast, the rubber-band will be broken and the Doers will be lost
In the real world this means that the Doers won’t be there anymore and therefore in a worst case scenario, no one will take care about the integration of the new and cool stuff into everyday's work anymore. Ultimately the reliability of the system will be lost and this will cause outages sooner or later.
This is why it is important that you have an eye on both groups, the edge people and the Doers
… NEXT SLIDE!
Say some words about the picture: SpaceX Falcon Heavy, a picture of redoing something
Recreate everything
As shown in the last three slides the system grews and therefore we hit some limitations of our initial design about how we work for example we had to handle the multi-project multi-stages pipeline problem
We had to take care about the Doers, it is important to NOT leave them behind, therefore we need to simplify the pipeline usage
And to do that we need simpler steps to setup the environment because of the growths of the system we got
More backends
More Docker Swarm (true!)
More GitLab projects
More Load Balancers
GitLab includes (10.2018)
GitLab includes enabled us to put various pipeline templates into a single central place. Without them, it would not be possible for us to manage the still raising amount of GitLab projects
The use of GitLab includes makes it easy for us to setup GitLab pipelines without copying the logic of the gitlab-ci.yml into every single project - so no code duplication
Due to the use of GitLab includes we are able to provide changes to the GitLab pipelines and stages without the need to refactor every single project if something changes or need to be changed, like CI/CD Pipeline variables - This is a super benefit for us because currently we have around 2000 projects in our on-premises GitLab and without the includes, we would have been lost
All this changes enabled us to grew up once again, but once again we have to change our working culture. And in our case we move over to
… NEXT SLIDE!
Say some words about the picture: Saturn 5 flight configuration is a nice metapher for the GitOps way of work
GitOps
No changes without commit
No manual changes
No cheating the systems (by hacking into consoles)
And for us it is the only way….
For us the only way
...because we have to support
Multiple environments like dev, test, stageing, education and production environments in multiple infrastructures like on-premises and clouds
These environments are often represented by branches inside the deploy Git-repositories and of course there are also different branches for these environments inside the source-code repositories
Any combination of branches and repositories are possible and needed, because not every project is using the latest software stack possible - there are also legacy applications which need a simpler or a more complex setup
Furthermore, we have different types of pipelines, for example scheduled ones, manual ones or fully automatic pipelines
This is all handled by central templates and without GitOps it wouldn’t be manable, because...
Basically, in our environment a deployment is normally triggered by a Git-Tag. This gives the developer and operator the opportunity to push the latest source-code or the latest configuration changes as often as they wish to save their work. The pipelines will only run if a Git-Tag is pushed or created.
Hundreds of projects
And now, imagine, that we are talking about several hundreds of projects
In our case we currently have somewhat around 2000 GitLab projects
Different knowledge
GitOps and this level of automation is needed, because there are different levels on knowledge inside the teams and not everyone can take care about every detail
Different software stacks
And as said above, we have to support a very heterogeneous environment with a lot of different software stacks
So this is why GitOps is key for us! But GitOps is also a cultural change and therefore we also adopted some public known GitLab.com strategies...
… NEXT SLIDE!
…. one of them is the Kanban board
Kanban board
We have started to create issues for nearly everything
This helps us, even in a small team to always have an overview about what is currently going on
For example, we are doing it to get a clue about what is our everyday work and most important, what work is probably holding us back - this means for example unplanned work or work which is not tracked everywhere else
I can only suggest to do something like that because it really helps the team to not only be more focused on the real work but also to be more creative. If you are interested to read more about it, I can suggest to read the book “The Phoenix Project”
Talk with each other
The next thing we’ve adopted is, that we listen to each other because there are often different points of view onto the same problem
Listen to and learn from other people
And, for sure, always listen to the stories and learn from experiences of others
And now finally you might still ask, why are we doing all this stuff? And what effect does all this have on us and the system?
Well…
… NEXT SLIDE!
… within two more years, from 2018 to 2020
The number of pipelines in operation rose from around 4000 to more that 60000 per month
And the number of jobs within this pipelines rose from approximately 15000 to little more 100000 per month
To sum it up, as a team, developers and operators we have managed to achieve almost ten times the throughput than in two years before. And beside that, we are now also future-proof.
But beside of all this numbers, for me, still the most important points of all are ....
… NEXT SLIDE!
… my team and the people I am allowed to work with.
Everything of the story I told you, wouldn’t be possible without the people who have the passion and the motivation to do this transitions and to evolve the system. Therefore it’s an honor for me to show them on this slide today - these are the real heroes - and therefore people are the real heros when it comes to transitions!
Without them it would not be possible to “Go Where No One Has Gone Before!
Thank you team!
… NEXT SLIDE!
… and finally to close my talk,
Attempt to automate wherever it’s appropriate!
I know it’s hard to start with, but with every step you do it gets easier and it helps you to get back more slack time to be creative!
… and then
YOU(!) Can Go Where No One Has Gone Before!
THANK YOU!