Tag: rcs

Moving my code from Launchpad to GitHub

This afternoon, I decided to convert my old code repositories from bzr to git, and move them from Launchpad to GitHub. I have converted Cassowary.net, PydgetRFID, Facade, as well as Uiml.net.

It turns out that converting a bzr repository with a single branch to git is quite easy. Here’s how I did it (after installing the latest versions of both bzr and git):

$ mkdir repo.git
$ cd repo.git
$ git init
$ bzr fast-export --plain ../repo.bzr | git fast-import
$ git reset --hard

Of course, repo.bzr is your old bzr repository here, while repo.git is your newly created git repository. The fast-export/fast-import commands convert the repository’s history from bzr to git. To populate our directory with the bzr repository’s files, however, we also need to perform a hard reset. If you need to convert multiple bzr branches to git, have a look at this post.

I started experimenting with distributed version control systems (DVCS) while working on my MSc thesis (around 2005). I originally maintained Cassowary.net using darcs (a DVCS written in Haskell), and switched to bzr later. Back in the early days of Uiml.net, we used CVS, which was horrible at moving or renaming files, not to mention the inability to work offline and still commit your work.

I preferred darcs and bzr over git at the time because they were a lot easier to use. These days, git seems to have catched up to bzr in that regard. Back in 2006, I also did a few performance tests and found that bzr was a lot slower than git. Things seem to have improved somewhat, but git is still The King of Speed. Oh, and GitHub is great!

Bzr vs git, the sequel

A while ago I noticed Jordan Mantha repeated my old bzr vs git benchmark.

I was curious to see if things changed. It seemed that both systems have improved in speed, but there were no shocking results.

There are some differences between our experiments though. Jordan wonders how long it took for me to commit my changes after adding. In fact, I didn’t do that directly but did a diff before committing. I don’t know why, I think I just forgot to commit. This clearly showed the big difference between bzr and git before and after committing. Bzr was a lot faster than git before committing, while git outperforms bzr after committing. I guess this is because git makes heavy use of data structures created during a commit.

Bzr took almost 4 minutes to diff after committing, while git only took about a second. This clearly showed a problem with bzr, it started walking the entire tree again, while git could immediately see that nothing changed. Bzr exacerbated the same problem when performing a status, it was unnecessarily slow. Jordan’s tests showed that the status problem was solved, but doing a diff when nothing has changed is still a major performance issue on big trees.

Committing a small change was a lot faster for both git and bzr. And it should! This is after all a really common operation.

Jordan did another comparison with bzr, git and hg, which showed that hg is actually fairly close to git in terms of performance.

Linus Torvalds talk about git @ Google

Takis referred me to a Google tech talk about git by Linus Torvalds.

[youtube:http://www.youtube.com/watch?v=4XpnKHJAok8]

Although I am more of a bzr advocate (for one because it’s easier to use, and fast enough for me), he gives a good explanation for the need for distributed revision control (which both bzr and git belong to).

Well worth a look.

Uiml.net and Cassowary.net now in Launchpad

I just registered Uiml.net and Cassowary.net in Launchpad.

Launchpad automatically creates a Bazaar branch from the CVS repository, which allows me to maintain my work in Bazaar, while still importing changes from the Uiml.net CVS tree.

Bzr versus git

I have been exploring distributed revision control systems for almost a year now.

While I am impressed by the speed and features of git, I still prefer user-friendly interfaces such as the ones provided by darcs and bzr.

So I decide to compare the performance of git and bzr on a well-known example of a large source tree, the Linux kernel. I downloaded linux-2.6.0, and the latest version, linux-2.6.15.4.

These are the versions of git and bzr that I used:

$ git --version
git version 0.99.9c

$ bzr --version
bzr (bazaar-ng) 0.7pre

First I did an init in the 2.6.0 directory:

$ time bzr init

real    0m1.593s
user    0m0.140s
sys     0m0.047s

$ time git-init-db

real    0m0.161s
user    0m0.000s
sys     0m0.006s

Then I added all files:

$ time bzr add > /tmp/bzr-add

real    0m31.870s
user    0m31.072s
sys     0m0.520s

$ time git-add > /tmp/git-add

real    0m42.121s
user    0m32.428s
sys     0m3.208s

To my surprise bzr was quite a bit faster than git here.

Then I did a cp -r ../linux-2.6.15.4/* . inside the linux-2.6.0 directory. Now, how about doing a diff?

$ time bzr diff > /tmp/bzr-diff

real    1m13.869s
user    0m26.168s
sys     0m2.860s

$ time git-diff > /tmp/git-diff

real    2m26.982s
user    1m48.952s
sys     0m39.048s

Again, bzr is faster than git.

Let’s commit the initial revision:

$ time bzr commit -m "" > /tmp/bzr-commit

real    2m4.757s
user    1m16.578s
sys     0m6.195s

$ time git-commit -a -m "dummy." > /tmp/git-commit

real    0m54.964s
user    0m49.719s
sys     0m3.297s

As you can see, committing a large tree is where git really shines.

Next, I did a stupid test, I wanted a diff of all changes after the commit. Since I didn’t change anything, the diff would be empty:

$ time bzr diff > /tmp/bzr-diff-after-commit

real    3m51.918s
user    0m7.216s
sys     0m1.970s

$ time git-diff > /tmp/git-diff-after-commit

real    0m0.057s
user    0m0.009s
sys     0m0.047s

Git knows there have been no changes, so it immediately returns. Bzr on the other hand searches the whole tree again to see if there are changes, which is of course very slow. Git is a clear winner here.

Let’s give bzr another chance, what about a status report? This is something developers do often, they want to know what files were modified, added, deleted, and so on.

$ time bzr status > /tmp/bzr-status-after-commit

real    0m19.711s
user    0m15.180s
sys     0m1.178s

$ time git-status > /tmp/git-status-after-commit

real    0m0.442s
user    0m0.256s
sys     0m0.202s

It’s not as bad as the diff, but bzr is still way to slow here. Git, as expected, can easily check if there have been changes, so it immediately returns.

Now, let’s actually do a change. I added my name to the MAINTAINERS file, and tried to commit it.

$ time bzr commit -m "bla" > /tmp/bzr-commit-after-change

real    2m6.685s
user    0m31.734s
sys     0m3.458s

$ time git-commit -a -m "bla" > /tmp/git-commit-after-change

real    0m7.364s
user    0m6.936s
sys     0m0.430s

It takes git only 7 seconds to update its datastructure, in order to easily check if there has been a change at a later time. Bzr however seems to be traversing the whole tree again.

As a conclusion, we can say that until the initial commit, bzr is very fast (mostly even faster as git). However, after committing, git can easily check against the committed version, while it takes bzr very long to do that. Performing a diff, getting the status or committing again is very slow compared to git.

Without an intial commit, bzr diff is two times as fast as git-diff. Adding is also quite a bit faster.

Of course this was not a fair comparison, since the bzr developers have not been optimizing for speed at the moment. Speed improvements are planned for bzr 2.0. And I can’t blame them, wasn’t it Tony Hoare who stated: premature optimization is the root of all evil?

I think bzr will certainly do for my projects, and can only get better in terms of performance. Its user experience is excellent as opposed to git’s. It also has good support for Windows, which is an important factor for general adoption in my opinion.