October 23, 2010
This summer I went on a quest to improve my workflow. I wasn’t really happy with the standard Mercurial/mq approach used by most Mozilla developers. I spent a while experimenting with alternative ways of using Mercurial, and even did a fair amount of hacking on hg itself to fix some bugs and shortcomings. I wrote quite a long blog post about all of this and almost published it, but in the end I decided that it still wasn’t as good as I’d like it to be.
To its credit, Mercurial’s extension model made all this very doable, and I probably could have continued to cobble together a workflow that did what I wanted. However, I was pretty sure that Git did exactly what I wanted out of the box. So I gave it a shot, and it works even better than I’d hoped.
Brief Aside – A Ten-Second Introduction to Git
Git and Mercurial are quite similar – both use SHA-1 hashes to identify commits. The primary user-facing difference between Git and Mercurial is that Git branches are extremely lightweight. Git is essentially a user-space filesystem, where each commit is represented as file named by its SHA-1 hash. A branch is nothing more than a smart alias for a hash identifier. So a Git repository consists of 3 primary things (this is a bit of an oversimplification, but it’s fine for our purposes):
- An objects directory, which contains a soup of commit files, bucketed into sub-directories.
- A refs/heads directory, which contains one file for each named branch. So if I have a branch called collectunderpants whose latest commit is 7bc99958bc164028b94ec47dbf1fb1ad9034c580, there’s a file called refs/heads/collectunderpants whose contents is simply 7bc99958bc164028b94ec47dbf1fb1ad9034c580. That’s all git needs.
- A file called HEAD containing the name of the current branch. This is important, because when I make a commit, Git needs to know which branch should be scooted forward to point to the new commit.
Suspend your disbelief for the time being and assume that I have a git repository called /files/mozilla/link that contains an up-to-date mirror of mozilla-central in git form (I’ll explain how this is done later).
$ cd /files/mozilla
$ git clone link src
After a waiting a few moments, I now have a full git repository named src. The default branch is master, which I can see immediately because of a neat shell prompt trick (works best when put in ~/.profile):
$ export PS1='\u@\h \w$(__git_ps1 " (%s)") $ '
/files/mozilla/src (master) $ echo w00t!
So I’m on master. Unfortunately, I check TBPL and it looks like the tree is burning as a result of another Jonas Sicking push-and-run. The last green commit was 5 changesets back, so I want to base my work off of that.
(master) $ git checkout -b master-stable master~5
This makes a new branch called master-stable based 5 commits back from the commit pointed to by master, and switches the working directory to it.
I make a .mozconfig, set the objdir to /files/mozilla/build/main, make -f client.mk, and go shoot some nerf darts at dolske. A short while later, I’ve got a full build waiting for me in /files/mozilla/build/main.
Let’s run some simple diff queries:
(master-stable) $ git diff HEAD # Diffs against the current head
(master-stable) $ git diff master # Inverse of Sicking's bad push
(master-stable) $ git diff HEAD^^^ # workdir vs 3-commits-back
(master-stable) $ git diff HEAD~3 # Same as above
(master-stable) $ git diff master-stable~3 # Same as above
The ability to reference revisions symbolically (relative to either heads or branches) is really nice, and is something that I missed with Mercurial. Edit: bz points out in the comments that this is actually possible with Mercurial.
Now suppose I get an idea for a quick one-off patch, and hack on a few files. To save this work (along with its ancestry), I create a branch off the current head:
(master-stable) $...hack hack hack...
(master-stable) $ git checkout -b oneoff
(oneoff) $ git commit -a
The first command creates a new branch called oneoff that points to the same commit as master-stable. The second creates a new commit containing the changes in the working directory. The reason for the -a option has to do with a git feature called the “index”, which is a staging area between your working directory and full-blown commits. I don’t want to digress too much, but you should definitely read more about it.
Remember that branches are just aliases to SHA-1 identifiers, which in turn are used to locate the actual commit in the soup. So oneoff is an alias for a SHA-1 identifier which points to the new commit. That commit knows the hash of its parent, which is the same hash pointed to by master-stable. Git commits are immutable, since their names a are cryptographic function of their contents (so if a commit changes, it’s really just a new commit). Furthermore, git is garbage collected when you call git gc. So objects in git are just like immutable objects in a garbage-collected language. For example, suppose we want to modify that commit we just made:
(oneoff) $ ...more hacking...
(oneoff) $ git commit -a --amend
Normally git commit makes a new child of the previous commit. However the --amend option makes a new sibling that combines the previous commit with any working changes, and points the branch and head to it. The old commit is still there, but is now orphaned, and will be removed in the next call to git gc.
I use one branch per bug, and one commit per patch. This allows me to model my patches as a DAG, where patches are descendents of work they depend on. Contrast this with the MQ model, where a linear ordering is forced upon possibly unrelated patches.
Suppose I’m doing some architectural refactoring in a bug called substrate, and using the clean new architecture in a feature bug called bling. Initially, I start work on bling as follows:
(substrate) $ git checkout -b bling
(bling) $ ...hack commit hack commit hack....
But then I think of something else that would be useful for bling that should really go in substrate. So I stash away my uncommitted changes, and go add another patch to substrate:
(bling) $ git stash
(bling) $ git checkout substrate
(substrate) $ ...hack hack...
(substrate) $ git commit -a
(substrate) $ git checkout bling
At this point, I’d really like get back to working on bling, but unfortunately bling isn’t yet based on the latest patch in substrate. To fix this, we need to rebase:
(bling) $ git rebase --onto substrate bling~3..bling
This tells git to take all the changesets in the range (bling~3, bling] and apply them incrementally as commits on top of substrate. If there are conflicts, I’m given the opportunity to resolve them, or to abort the whole endeavor. Once the rebase is complete, the branch bling is updated to point to the new, rebased tip. Now I can reapply my work-in-progress and get back to business:
(bling) $ git stash pop
My code is always perfect the first time I write it, but suppose for the sake of argument that Joe gets a bee in his bonnet and I have to alter patch 7 of 18 in bigbug to appease him. I could do it the long way:
(bigbug) $ git checkout -b _tmp HEAD~11
(_tmp) $ ...appease appease...
(_tmp) $ git commit -a --amend
(_tmp) $ git rebase --onto _tmp bigbug~11..bigbug
(_tmp) $ git checkout bigbug
(bigbug) $ git branch -d _tmp
This gets tedious after a while though. Thankfully, there’s a better way:
(bigbug) $ git rebase --interactive HEAD~12
This fires up an editor, which allows me to select which parts of the history I want to modify:
pick f901b35 patch 7
pick 613cb9e patch 8
pick db26bd3 patch 9
pick 678b170 patch 18
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like "squash", but discard this commit's log message
So if I change the pick on the first line to edit (or just e), git brings me to that revision, lets me edit it, and does all the rebasing for me. Huzzah!
Pushing to Bugzilla
One nice bonus of git is an add-on developed by Owen Taylor called git-bz. I’ve made some modifications to it to make it more mozilla-friendly, and haven’t yet found the time to make them upstreamable. So in the mean time, I’d recommend that you grab my fork, git-bz-moz.
While it does a lot of things, my favorite part of git-bz is pushing to bugzilla. For credentials, git-bz uses login cookie of your most recently opened Firefox profile – so if you’re already logged into BMO things should work seamlessly. Let’s say I want to attach all 18 patches of bigbug to bug 513681. I run:
(bigbug) $ git bz attach --no-add-url -e 513681 HEAD~19..HEAD
And then I’m presented with a sequence of 18 files to edit in my editor, each of which looks like the following:
# Attachment to Bug 513681 - Eliminate repeated code with image decoder superclass
Commit-Message: Bug 513681 - Eliminate repeated code with image decoder superclass.
#Obsoletes: 470931 - patch v2
# Please edit the description (first line) and comment (other lines). Lines
# starting with '#' will be ignored. Delete everything to abort.
# To obsolete existing patches, uncomment the appropriate lines.
This pulls the relevant data from the bug, and let’s me do a lot in one edit. I can set the patch description, add a comment in the bug, edit the commit message (for facilitating hg qimport), obsolete other patches in the bug, flag for review, and grant self-review. I’ve found this to be a massive timesaver when working on many-part bugs.
When I want to push, I just qimportbz from the bug. This gives me an incentive to make sure that the patches committed are the ones on bugzilla.
Aside – I haven’t done much active development since the end of august, and git-bz just choked on the cookie database of a recent nightly when I tried it. A 3.6 profile still works fine though.
Edit – dwitte points out in the comments that this is due to a change in the sqlite database format, and should be fixed by upgrading to sqlite 3.7.x.
Multiple Working Directories
The ability to multitask is crucial to being productive in the Mozilla ecosystem. I can be waiting on tryserver results for one patch, guidance from bz on a second, review from Jeff on a third, and a dependent patch from Joe for a fourth. I need to be able to work on multiple patches at once, and context-switch quickly.
In theory, multitasking with git is quite simple: just do a git checkout of the branch you want to work on. However, some code changes require significant rebuilding. For example, if I have a patch that modifies nsDocument.h, context-switching between that patch and any other patch incurs a massive recompilation burden.
I’ve heard through the grape-vine that bz manages this problem by having 8 different mercurial repositories (each with its own object directory), and economizing on space via hardlinks. This eliminates the recompilation burden, but doesn’t allow work to be easily shared between repositories. For example, I might want to give both bling and substrate separate object directories, but still be able to rebase bling on top of new code in substrate.
Thankfully, git allows me to get the best of both worlds with multiple working directories.
/files/mozilla/src (blah) $ mkdir ../proj
/files/mozilla/src (blah) $ cd ../proj
/files/mozilla/proj/ $ git-new-workdir ../src a
This gives me a full working directory and a lightweight repository that is composed mostly of symlinks to files in ../src/.git/. Everything is shared seamlessly between them, and just about the only thing private to the new repository is the HEAD file, which specifies the checked-out branch. I can then make a .mozconfig pointing to a new object directory in /files/mozilla/build/a, and build away.
Earlier in this post I promised to explain where /files/mozilla/link came from.
I initially started using git with a mirror maintained by Julien Rivaud. Unfortunately, there was some flakiness with the cron job, and the repository would often stop updating from mozilla-central. So I decided to generate my own mirror. Edit: Julien mentions in the comments that the repository should be reliable now. Give it a shot!
Long-story short: don't use hg-git. It chokes miserably on mozilla-central. Instead, use hg-fast-export. Let it run overnight, and it should be done in the morning. Incremental updates are also very fast (roughly linear in the number of new commits), so I don't ever find myself waiting for it.
- From a general zippiness standpoint, git seems about 5 times faster than Mercurial. Your mileage may vary.
- Overall, I really like the garbage-collection model of git. With Mercurial, rewriting history involves stripping entries out of the repository, which can be very slow. With git, unwanted objects go away just by redirecting pointers, and they're still recoverable (with careful munging) until the next git gc.
- I've found that I'm spending a lot less time dealing with merge conflicts than I did when I was using hg/mq. Git seems to be pretty smart about these things, and I think it uses 3 lines of context internally. In contrast, it's standard to use 8 lines of context for mq patches so they can be easily exported to bugzilla. I've modified git-bz to generate 8 lines of context when posting to bugzilla, which allows me to be more efficient locally while still sharing my work in the appropriate format.
There's lots more to say about git, but I think that this is enough for now. Share your experiences in the comments!