eZ Community » Blogs » Arne Bakkebo » Branching and code review procedures...

By

Arne Bakkebo

Branching and code review procedures - QA#4

Wednesday 24 April 2013 9:00:30 am

  • Currently 5 out of 5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

First rule of getting started with code reviewing: Be consistent!

[QA#3 - Best practices]

Actually, that's the same rule I mentioned in the previous posts, on code formatting standards and best practices. Being consistent is a generally good thing in developing software. It saves everyone else that the software will ever interact with on any level a whole lot of time, trouble and annoyances.

If you are not consistent when starting code reviews, people will tend to stop following the procedures pretty soon. "This change is so small, it doesn't really matter." and "What could possibly go wrong with this change?" are standard phrases. Of course, things will go wrong. That's the one and only absolute fact of software development. Being consistent is all about building good habbits in people. Until these habbits are ingrained in the dark souls of the developers, do not deviate from the code review procedures.

The following will be based on git as a code repository system. It could work just as well with Subversion though  (allthough branching is not as streamlined as for git), and other code repository systems like Mercurial. Whatever you choose, learn your tools. Learn all the commands in your code repository client, and learn and understand how brancing, merging and conflict fixing works, as well as how to do diffs and read commit logs etc. This is a tool you will be using multiple times every day for the next multiple number of years. It will save you a massive amount of time if you know what you are doing, and what happens in the background when you are doing it.

I can give a quick example relating to this in the discussion of git pull vs.git fetch[1]. As many of you probably know, git pull is a shortcut to the two commands git fetch and git merge. The problem with using this shortcut, particularly if you don't know git very well, is that it obscures the fact that you actually have two copies of the repository locally, one that is a mirror of the remote repository and one that you work on locally. This is important to know, to understand what is happening when for instance you get a conflict on updating from remote server, or when you get a merge commit in the log because of the git pull command.

In our projects, I have adopted a branching philosophy like the one illustrated here:

Git branching example

Git branching example

It's inspired by the branching philosophy in [2], with a couple adjustments to make it more flexible.

We here have three main branches that are permanent, as well as multiple task branches that only live from the task is started until it is deployed to production (in praxis we tend to keep the task branches for a while, and delete them in bulk at some suitable point, for instance after release time). At the beginning, the permanent branches master, release and development are equal.

development is the testing branch, whenever a task have been accepted in the code review it is merged into development. The development branch can then be updated on a testing/staging server, and the customer can test the new feature. If the customer complains about something while testing, the developer goes back to the task branch, fixes the problem, does a new code review, and then does a new merge into the development branch.

Whenever the customer approves the task, the task branch is merged into release branch. This is the code base for an upcoming release. Any new task we want to start on can be branched out from release, and this new task branch will then include the code of any old tasks that have been completed.

When we are ready for release, we merge the release branch into master and updates the production system with the master branch. This way the master branch will always be production ready. Whenever we need an emergency quick fix, we make a new task branch based on master, does a minimal fix of the problem and then follow the same procedure as a regular task branch. However, when the fix is approved by customer we merge it right back into master and can then update the production server. Notice that for production quick fixes it is particularly important to follow the code review and testing procedures.

A few points in relation to the diagram:

 

  • At no point ever do we merge development branch into release, master or any task branch. This would contaminate release/master with tasks that are sent to testing but not yet approved for production.
  • master or release can be merged into development at any time, but if you follow correct branching procedures it should never be necessary.
  • The commits illustrated on the diagram are shown only on task branches. As you merge a task branch into development the commit log for the development branch will of course also contain the same commits, as well as a separate merge commit (if you do not merge with fast forward).
  • task1 have three commits before being merged into development and successively to release.
  • task2 have one commit before being merged into development. Apparently, the customer was not happy with it, so the developer fixes the problem and makes a new commit to the task2 branch before merging it into development again. Then task2 is approved by customer and merged into release branch.
  • In this example, when task1 and task2 are completed, we merge release into master and do a deployment to production environment with the master branch.
  • Notice that task3 are branched from release before task2 is merged into release. This means task3 branch will not contain the code for task2 solution, only task1.
  • If the task2 code and task3 code overlap, we may end up with a conflict when trying to merge task3 into development. This in itself is not a problem, but requires a little bit of extra work in resolving the conflict when merging.
  • If we anticipate this problem, we may choose to merge task2 into task3 branch either before we start developing or after task2 is approved. If we do this before task2 is approved, we need to wait with merging task3 into release until task2 have actually been approved, otherwise task2's unapproved code will follow task3 into release branch.

You'll see that this branching philosophy is quite flexible, as long as you follow procedures strictly. You'll be able to work on any number of different task branches in parallel, and can choose to deploy each independent branch to either staging or production whenever appropriate. If all this is a bit hard to understand, I recommend spending some time with the branching diagram to understand all implications of it.

In the depicted example we assume regular releases at given intervals. In one project we have adopted a continuous integration procedure, deploying each task to production as soon as it is approved by customer. We there use the same branching philosophy as before, only we just skip the release branch and merge task branches directly into master. This illustrates the flexibility of the model, you can add any number of more or less permanent branches to merge the task branches into as the need occur.

A task in this setup should always be small, no more than about eight hours estimated work. There are a number of reasons for keeping your tasks small and limited, the main one in this context is that if a task requires a lot of changes in the code, your pull request will become quite big and the code review will be very hard to do properly. If there's a lot of changes, people tend to skim through them without really paying attention, and they will get bored with the whole thing. This all amounts to a useless code review and a lot of wasted time. Also, if there's a lot of changes, it's more probable that there will be a lot of comments on the code from the reviewers, requiring the developer to go back and fix things, and making it hard to consolidate the code and get it ready for deployment. If the developer need to go many rounds of fixing things, this becomes quite frustrating and demotivating for the developer, and need to be avoided.

Often we have bigger sub projects that requires considerably more than one day of work. We then split the sub project up into smaller more or less isolated tasks, and create one task branch for each. We also create a new "task master" branch for this sub project, which we do pull requests against for the smaller task branches (instead of doing pull requests to development branch, as usual). This "task master" branch can then be merged into development and release branch as appropriate, containing all the approved code that relates to the sub project.

Before introducing code reviews in your project, agree on a procedure for how to do it, and make sure that everyone agrees with it and understands it. I can suggest the following procedure points, just to get started:

  1. Plan for a maximum of four hours development time pr.task
  2. Keep the master branch as the production ready branch at all times
  3. Always branch out a new task from master or release
  4. Use a standard for naming your task branches (more on this below)
  5. Use task branches for every change (no exception!)
  6. When done coding, push to Github and do pull request to development branch on every task branch
  7. Do a code review of your own pull request before sending it
  8. Never merge to master until you get at least one "Ok" comment from others
  9. Check others pull requests at least every two hours

Doing a code review of your own code may seem meaningless at first. However, you'll be surprised at how much you learn from reading your own code in a new context, and how much easier it is to see the obvious bugs when you look at it formatted for code reviewing.

Do not wait too long with doing the code reviewing of the pull requests from the other developers. It will extend the development time a lot, and if there is a tendency to avoid the reviews the usefullness of them will soon dwindle into nothingness.

Notice that this procedure are geared toward people and teams that are not familiar with code reviews and/or branching. Once the team get in the habbit of doing things right, you will probably want to do adjustments to optimize for ease of use. I recommend that when getting started with code reviews, do a discussion of advantages and disadvantages of the procedure every week or so. Consider making changes to the procedure if something appears not to be working well.

For naming the task branches, we also have a procedure (I'm a fan of procedures :). We use Jira for task tracking. Because of this we include the Jira task number in the branch name. This let's us easily look in Jira for more information about the history of the task given a branch. And conversely, easily find the branches relating to a given Jira task. Of course, this will work for whatever task manager you may use.

After the task number, add a short but descriptive text about the task (preferably no more than 3-4 words). A good name will let you remember what the task is about without looking it up in Jira every time. Use '-' as word separator, as this seems to be the defacto standard.

Task branch name example: '666-rest-interface-unit-test'.

Once we have the Jira task number in the branch name, we might as well add it to each commit comment also. I've set up a git hook script for doing this automatically. The advantage is that the Jira task number will remain in git history, even though the commit have been merged into master and the task branch are long gone. Also, code reviewing software like Atlassian Crucible depend on the commit comment to be able to relate to a task, it does not use branch names for this.

Add the following code to the file .git/hooks/prepare-commit-msg in your git repository:

#!/bin/sh
 
 
 
ORIG_MSG_FILE="$1"
PROJECT_ID=42707
 
TICKETNO=`git branch | grep '^\*' | perl -ne 'while ( /^\* (\d+)/g ) {print $1;}'`
 
if [[ ! $TICKETNO = '' ]]; then
        TEMP=`echo "$PROJECT_ID-$TICKETNO: " && cat $ORIG_MSG_FILE`
        echo "$TEMP" > "$ORIG_MSG_FILE"
fi

If you have more than one git repository, this must be set up in each of them.

I have done a couple workshops internally in Making Waves about what to focus on in a code review. I'd like to talk a little bit about this in my next post. Until then, good luck with the branching. And feel free to ask questions about this stuff.

[1] Git fetch vs.pull: http://longair.net/blog/2009/04/16/git-fetch-and-merge/
[2] Branching philosophy: http://nvie.com/posts/a-successful-git-branching-model/

Proudly Developed with from