Using Git/Gerrit and Jenkins to Manage the Code Review Process

Using Git/Gerrit and Jenkins to Manage the Code Review Process

ESC-4024

Presenters : Marc Karasek & Phil Hord

Code Review – What is it and why do we do it?

The idea of the lone ranger programmer, cranking out code in his cube/office, is a nice romantic idea. In
reality it only leads to code that is obfuscated and unmaintainable. Having a code review process as
part of your development flow however, leads to more maintainable code.

The ‘ideal’ code review system:

1. Web interface – Allow access from multiple development sites.
2. Allows pre-commit code reviews
3. Can handle a large number of repositories
4. Inline comments and block comments
5. Integration with a build server.
6. Review Process Workflow that can be integrated into the development process –
Developer does not have to do anything “extra” to start review process.

Let’s take each of these in order:

Web Interface
Today we have development teams spread around the world. The old adage the sun never sets on the
British Empire could be applied to some of our current development teams. With developers not
always located in the same geographical/time zone area, it becomes important to have a web interface
to allow code review to be a process that is not dependent on sitting around a table. A developer can
submit his code for review and when his teammate gets into his office, he can review his code.

Pre-commit code reviews
One of the biggest problems facing code review is how to satisfy both requirements to have the code
under SCM, and at the same time not impact any current code base with pre-reviewed code. There are
many ways to implement this, having a separate (sandbox) repository for untested/unreviewed code
and submitting patchset(s) for changes into a SCM are a couple of ways. The problem here is that most
of these methods add overhead to your development process. Having to maintain two repos, one for

production one for development or adding additional steps to the development process to create the
patchset for a change to be reviewed.

Can handle a large number of repositories
Development teams today work on multiple projects; each one normally has its own code base that
needs to be maintained. Being able to maintain a large number of different repositories, while not a
major issue with SCM systems today, are worth mentioning. It can become an issue in how the SCM
stores the repository and how much space on the server it takes.

Inline comments and block comments
This is important to allow reviewers, to not only comment on the actual change itself, but add
comments inline in the patchset/code that is being reviewed. Think of this as a global comment on the
change, “The commit message needs to have some more verbiage added to describe the change better”
versus a local comment in the code, “This variable is being used in file ABCD.c, check this file to make
sure we do not have an issue.” Both types of comments, inline and block, should be part of the code
review history/process.

Integration with a build server
Projects that share code across platforms and need to be able to cross check common code for multiple
build targets. Having a build server that can do the ‘grunt’ work of building multiple targets for a code
base puts a check in place that is not dependent on a developer doing the builds. With some projects
having many targets, having a build server helps to automate and standardize the process.

Review Process Workflow that can be integrated into the development
process.
The trick is to integrate the code review so it is a part of your ‘normal’ code development process. If
there is any “exception” path that allows engineers to bypass code review for emergencies, this will
become the normal path. From the developer’s point of view, the code review process should have a
minimal impact on the development process. The best case is that the developer normal check/commit
process for submitting code into the SCM is the code review process.

Current Processes

Most code review systems/processes generally fall into one of three models:

1. Code is checked into a temporary holding branch for review. Once it has been reviewed, it is
then merged into a master/release branch. This merge maybe could be done by either the
original developer or a dedicated build/repository manager.
2. Code is kept locally on the developer’smachine; it is posted/emailed for review. Once it is has
been reviewed it is the responsibility of the developer to merge this into the release/master
branch.
3. Separate branches are maintained for release and development. The development branch is
never guaranteed to build but always has the latest and greatest in it. Code maybe checked into
this branch with no review. Once checked in, reviewers are notified and provided a link to the
commit for review.

Each of these processes has its good and bad points. What all of them lack is a way to automate the
review process. These includes

1. being able to cherry-pick/pull a patchset to a local repository for review/testing
2. review the changes w/o pulling down the code to your local machine
3. review the history of this change
a. how many times has it been through the review process
b. what other reviewers comments are

Let us see how the above processes stack up against the ‘ideal’ code
review system.

Web Interface
All of the above could have some kind of web interface for accessing the code under review. This could
be as simple as a patchset sent via email, to a web based gui. Regardless of the method, this adds an
extra step in the development process. The engineer has to package his changes into a patchset, and
then either send it out an email list or post it to a web site. This adds time to the development process
and does not allow good tracking of review changes. The normal process would be for the developer to
receive feedback, generate a new patchset and then send/post this new change. There is no explicit
link between the old and new changes.

Only some of the above handle this requirement, #1 and #3. For these two the code is checked into a
holding area/development branch for review, prior to being merged over to the release/master. #2
fails this requirement, as the change only live on the developer’s machine and if it has an ‘accident’ then
the changes are lost.

Even the ones that meet this requirement have problems. As in the previous requirement this adds to
the development process. The code needs to be merged over, after review. This is either handled by
the developer or by a dedicated build manger. At the end it is a manual step that adds time and takes
up resources.

Most modern SCM systems handle large repositories. This impacts the review process very little and is
best left for a separate discussion.

Most current review processes fail this requirement. Being able to view other reviewer’s comments on
a file or about the overall change is an invaluable resource that helps to streamline the review process.
Also being able to review past comments for this change, no one gets it right the first time that is why
we do code review, also leads to shorter review time.

This is normally a manual step in the review process, where a developer has to submit his job to the
build server. At the best it is somewhat automated, in a nightly weekly build that pulls all current
submitted changes in and attempts to build them.

Where this fails is that for all of the processes only #1 above, where the change is contained in its own
repository couldbuild. For #3, the development branch is never guaranteed buildable. So for a vast
majority of the time this adds time to the process. Someone has to go find out why the nightly/weekly
development build fails, inform the engineer that submitted the code, etc. For #2, there is no way for
the build server to get the code as it is on the developer’s machine.

process.
For all three of the above, each one adds additional steps into the process. For the developer it is a
multistep process to get his code submitted. They have to learn a ‘new’ process and how to use this
process in their development. For example: how to properly generate the patchset so that it can be
reviewed by the team or how to package their changes to submit them through a web interface for
review.

Introducing : Git / Gerrit / Jenkins

Using git as a SCM with gerrit as a frontend addresses most of the above requirements. Adding Jenkins
as a build/integration server covers the requirements using git/gerrit alone do not.

Web Interface
Gerrit provides a web interface that allows code review, patchset generation, cherry-picking, etc. of
patchsets that have been submitted for review. Access to this web interface and the underlying
repositories can be access controlled so that developers only have access to the projects that they are
working on.

It allows for a custom view of the patchset under review. A reviewer can choose to view any number of
lines that surround the change, up to the whole file. This allows each reviewer to view as much
information as they need, without having to check out any code.

This one item is worth using git/gerrit. Using gerrit as a frontend provides a ‘standard’ git interface to
the developers. They push there code to the git server, no special check in process, no special software
to install. The developer just pushes their code to a tag “refs/for/<branch>” that gerrit understands.
gerrit then takes the changes and creates a patchset from it and posts it to its web interface for review.

The patchset is ‘held’ in gerrit until the code has been reviewed. It then can be submitted into the git
repository. This patchset can be updated, abandoned, resurrected, etc. all without impacting the git
repository that it has been pushed to. This allows for changesets to be in review and pending without
impacting the code base. The patchset can also be updated by the developer based on comments
during review. They make the requested changes and just push the same commit to the git server.
Gerrit sees that this is a new patchset based on a previous one and adds it to the review as patchset<x>.

All modern SCM systems can handle multiple repositories. Where git stands out though is in the size of
the repository and how it stores the files.

For example the Mozilla repository is reported to be almost 12 Gb when stored in SVN using the fsfs
backend. Previously, the fsfs backend also required over 240,000 files in one directory to record all
240,000 commits made over the 10 year project history. The exact same history is stored in git by only
two files totaling just over 420 Mb. This means that SVN requires 30x the disk space to store the same
history.

One of the reasons for the smaller repo size is that an SVN working directory always contains two copies
of each file: one for the user to actually work with and another hidden in .svn/ to aid operations such as
status, diff and commit. In contrast a git working directory requires only one small index file that stores
about 100 bytes of data per tracked file. On projects with a large number of files this can be a
substantial difference in the disk space required per working copy.

This same comparison can be made between git and cvs, where a 3x improvement in disk space usage
has been seen.

A side effect of how git manages its repository is that each time you clone a repository locally you get
the full repository. All the history, etc. is cloned to the local machine from the server. This allows for
developers to work on code and switch between branches, search history, etc. without having to be
physically attached to the ‘central’ SCM.

Gerrit allows the reviewer(s) to enter both inline and block comments on any patchset they are
reviewing. It also keeps a history of the patchset as it goes through the review process. This gives the
reviewer/developer the ability to access the past history of comments on the change.

This is where the three amigos meet. Jenkins (build server) has built-in hooks to monitor and build
against a gerrit/git SCM system. This allows for automated builds to happen as a trigger event based on
a patchset being submitted into gerrit. The developer does not have to do anything special to trigger
this event; it is automatic based on the patchset and which branch it is being pushed to in gerrit/git.

This can be used to build a set group of targets based on a given branch, or all of the targets that a given
project builds for.

process.
This is where the rubber hits the road. Using gerrit/git allows the review process to be fully integrated
into the development process. The developer does not have to learn any new process, they just push
their changes to git and gerrit takes care of the magic

The developer pushes there code to the git server, no special check in process, no special software to
install. The code is pushed to a special tag “refs/for/<branch>” that gerrit understands. Gerrit then
takes the changes and creates a patchset from it and posts it to its web interface for review. It then
emails out to whoever is on the review list that a new review is in their queue. When the reviewer(s) log
into gerrit, they see the patchset they have been asked to review in their queue.

Reference Links :

http://git-scm.com/

https://code.google.com/p/gerrit/

http://jenkins-ci.org/

https://wiki.jenkins-ci.org/display/JENKINS/Gerrit+Trigger

http://hudson-ci.org/

Using Git/Gerrit and Jenkins to Manage the Code Review Process

Recommended

Recommended

More Related Content

Featured

Featured (20)

Using Git/Gerrit and Jenkins to Manage the Code Review Process