Tuesday, September 29, 2009

Browsing open source code history

There is a killer characteristic of open source that seems to be often overlooked: the ability to browse the source code history.

This will sound trivial to many, but it's something that I've seen a lot in questions asked on stackoverflow.com and other forums. People ask: "why has feature X changed?" "Why am I getting an exception P since I upgraded project Q to version R?"

The first thing you need to do is assume there is a good reason for that change and set out to find it, instead of considering it a bug in the OSS. Keep yourself humble, remember that the people working on OSS are generally very smart and really know what they're doing.

And the really generic answer to such questions would be (without any intention of being harsh): you can usually find out yourself.

An example: this stackoverflow question (here summarized):

In the previous version of NHibernate (2.0.1) the following property will validate:

internal virtual BusinessObject Parent
{
  get { /*code*/ }
}

However, in 2.1 it errors saying that the types should be 'public/protected virtual' or 'protected internal virtual'. Why is this requirement now there?

Now I've been using NHibernate since 0.83 but I'm no expert in NHibernate internals. Despite that, I was able to answer the question by doing this:

  1. Check-out NHibernate's source.
  2. Grep the source code for "public/protected virtual". Only result is NHibernate.Proxy.DynProxyTypeValidator.
  3. Go to NHibernate's Fisheye, browse to DynProxyTypeValidator.
  4. Browse the file's history, starting from the latest diff (at the time of writing it's this one), looking for changes in the proxy validation exception message.
  5. Only two diffs back I find the relevant commit.
  6. The commit message says:
    - Fix NH-1515 (BREAKING CHANGE)
    - Minor refactoring of proxy validation stuff
  7. Go to NHibernate's JIRA. Browse to NH-1515. Read the issue description.

That's it. No special knowledge needed. This process could be further simplified by using Fisheye's search instead of locally grepping (but I never seem to get good results from Fisheye) or getting NHibernate from github instead of svn and then grepping with git-grep or gitk. Browsing history with svn is so painfully slow that I prefer to do it with Fisheye.

By the way, this is the same approach I use on closed code at work when my boss asks me "Hey, this feature used to behave differently! When and why did we change it?" (and this happens a lot)

This is where best practices like specific commit messages, atomic commits and good issue tracking really pay off.

Documentation? I save that for stuff that this approach can't possibly handle.

No comments: