GitHub Flow

2011-08-31T00:00:00+00:00

Issues with git-flow

I travel all over the place teaching Git to people and nearly every class and workshop I’ve done recently has asked me what I think about git-flow. I always answer that I think that it’s great - it has taken a system (Git) that has a million possible workflows and documented a well tested, flexible workflow that works for lots of developers in a fairly straightforward manner. It has become something of a standard so that developers can move between projects or companies and be familiar with this standardized workflow.

However, it does have its issues. I have heard a number of opinions from people along the lines of not liking that new feature branches are started off of develop rather than master, or the way it handles hotfixes, but those are fairly minor.

One of the bigger issues for me is that it’s more complicated than I think most developers and development teams actually require. It’s complicated enough that a big helper script was developed to help enforce the flow. Though this is cool, the issue is that it cannot be enforced in a Git GUI, only on the command line, so the only people who have to learn the complex workflow really well, because they have to do all the steps manually, are the same people who aren’t comfortable with the system enough to use it from the command line. This can be a huge problem.

Both of these issues can be solved easily just by having a much more simplified process. At GitHub, we do not use git-flow. We use, and always have used, a much simpler Git workflow.

Its simplicity gives it a number of advantages. One is that it’s easy for people to understand, which means they can pick it up quickly and they rarely if ever mess it up or have to undo steps they did wrong. Another is that we don’t need a wrapper script to help enforce it or follow it, so using GUIs and such are not a problem.

GitHub Flow

So, why don’t we use git-flow at GitHub? Well, the main issue is that we deploy all the time. The git-flow process is designed largely around the “release”. We don’t really have “releases” because we deploy to production every day - often several times a day. We can do so through our chat room robot, which is the same place our CI results are displayed. We try to make the process of testing and shipping as simple as possible so that every employee feels comfortable doing it.

There are a number of advantages to deploying so regularly. If you deploy every few hours, it’s almost impossible to introduce large numbers of big bugs. Little issues can be introduced, but then they can be fixed and redeployed very quickly. Normally you would have to do a ‘hotfix’ or something outside of the normal process, but it’s simply part of our normal process - there is no difference in the GitHub flow between a hotfix and a very small feature.

Another advantage of deploying all the time is the ability to quickly address issues of all kinds. We can respond to security issues that are brought to our attention or implement small but interesting feature requests incredibly quickly, yet we can use the exact same process to address those changes as we do to handle normal or even large feature development. It’s all the same process and it’s all very simple.

How We Do It

So, what is GitHub Flow?

Anything in the master branch is deployable
To work on something new, create a descriptively named branch off of master (ie: new-oauth2-scopes)
Commit to that branch locally and regularly push your work to the same named branch on the server
When you need feedback or help, or you think the branch is ready for merging, open a pull request
After someone else has reviewed and signed off on the feature, you can merge it into master
Once it is merged and pushed to ‘master’, you can and should deploy immediately

That is the entire flow. It is very simple, very effective and works for fairly large teams - GitHub is 35 employees now, maybe 15-20 of whom work on the same project (github.com) at the same time. I think that most development teams - groups that work on the same logical code at the same time which could produce conflicts - are around this size or smaller. Especially those that are progressive enough to be doing rapid and consistent deployments.

So, let’s look at each of these steps in turn.

#1 - anything in the master branch is deployable

This is basically the only hard rule of the system. There is only one branch that has any specific and consistent meaning and we named it master. To us, this means that it has been deployed or at the worst will be deployed within hours. It’s incredibly rare that this gets rewound (the branch is moved back to an older commit to revert work) - if there is an issue, commits will be reverted or new commits will be introduced that fixes the issue, but the branch itself is almost never rolled back.

The master branch is stable and it is always, always safe to deploy from it or create new branches off of it. If you push something to master that is not tested or breaks the build, you break the social contract of the development team and you normally feel pretty bad about it. Every branch we push has tests run on it and reported into the chat room, so if you haven’t run them locally, you can simply push to a topic branch (even a branch with a single commit) on the server and wait for Jenkins to tell you if it passes everything.

You could have a deployed branch that is updated only when you deploy, but we don’t do that. We simply expose the currently deployed SHA through the webapp itself and curl it if we need a comparison made.

#2 - create descriptive branches off of master

When you want to start work on anything, you create a descriptively named branch off of the stable master branch. Some examples in the GitHub codebase right now would be user-content-cache-key, submodules-init-task or redis2-transition. This has several advantages - one is that when you fetch, you can see the topics that everyone else has been working on. Another is that if you abandon a branch for a while and go back to it later, it’s fairly easy to remember what it was.

This is nice because when we go to the GitHub branch list page we can easily see what branches have been worked on recently and roughly how much work they have on them.

It’s almost like a list of upcoming features with current rough status. This page is awesome if you’re not using it - it only shows you branches that have unique work on them relative to your currently selected branch and it sorts them so that the ones most recently worked on are at the top. If I get really curious, I can click on the ‘Compare’ button to see what the actual unified diff and commit list is that is unique to that branch.

So, as of this writing, we have 44 branches in our repository with unmerged work in them, but I can also see that only about 9 or 10 of them have been pushed to in the last week.

#3 - push to named branches constantly

Another big difference from git-flow is that we push to named branches on the server constantly. Since the only thing we really have to worry about is master from a deployment standpoint, pushing to the server doesn’t mess anyone up or confuse things - everything that is not master is simply something being worked on.

It also make sure that our work is always backed up in case of laptop loss or hard drive failure. More importantly, it puts everyone in constant communication. A simple ‘git fetch’ will basically give you a TODO list of what every is currently working on.

$ git fetch
remote: Counting objects: 3032, done.
remote: Compressing objects: 100% (947/947), done.
remote: Total 2672 (delta 1993), reused 2328 (delta 1689)
Receiving objects: 100% (2672/2672), 16.45 MiB | 1.04 MiB/s, done.
Resolving deltas: 100% (1993/1993), completed with 213 local objects.
From github.com:github/github
 * [new branch]      charlock-linguist -> origin/charlock-linguist
 * [new branch]      enterprise-non-config -> origin/enterprise-non-config
 * [new branch]      fi-signup  -> origin/fi-signup
   2647a42..4d6d2c2  git-http-server -> origin/git-http-server
 * [new branch]      knyle-style-commits -> origin/knyle-style-commits
   157d2b0..d33e00d  master     -> origin/master
 * [new branch]      menu-behavior-act-i -> origin/menu-behavior-act-i
   ea1c5e2..dfd315a  no-inline-js-config -> origin/no-inline-js-config
 * [new branch]      svg-tests  -> origin/svg-tests
   87bb870..9da23f3  view-modes -> origin/view-modes
 * [new branch]      wild-renaming -> origin/wild-renaming

It also lets everyone see, by looking at the GitHub Branch List page, what everyone else is working on so they can inspect them and see if they want to help with something.

#4 - open a pull request at any time

GitHub has an amazing code review system called Pull Requests that I fear not enough people know about. Many people use it for open source work - fork a project, update the project, send a pull request to the maintainer. However, it can also easily be used as an internal code review system, which is what we do.

Actually, we use it more as a branch conversation view more than a pull request. You can send pull requests from one branch to another in a single project (public or private) in GitHub, so you can use them to say “I need help or review on this” in addition to “Please merge this in”.

Here you can see Josh cc’ing Brian for review and Brian coming in with some advice on one of the lines of code. Further down we can see Josh acknowledging Brian’s concerns and pushing more code to address them.

Finally you can see that we’re still in the trial phase - this is not a deployment ready branch yet, we use the Pull Requests to review the code long before we actually want to merge it into master for deployment.

If you are stuck in the progress of your feature or branch and need help or advice, or if you are a developer and need a designer to review your work (or vice versa), or even if you have little or no code but some screenshot comps or general ideas, you open a pull request. You can cc people in the GitHub system by adding in a @username, so if you want the review or feedback of specific people, you simply cc them in the PR message (as you saw Josh do above).

This is cool because the Pull Request feature let’s you comment on individual lines in the unified diff, on single commits or on the pull request itself and pulls everything inline to a single conversation view. It also let you continue to push to the branch, so if someone comments that you forgot to do something or there is a bug in the code, you can fix it and push to the branch, GitHub will show the new commits in the conversation view and you can keep iterating on a branch like that.

If the branch has been open for too long and you feel it’s getting out of sync with the master branch, you can merge master into your topic branch and keep going. You can easily see in the pull request discussion or commit list when the branch was last brought up to date with the ‘master’.

When everything is really and truly done on the branch and you feel it’s ready to deploy, you can move on to the next step.

#5 - merge only after pull request review

We don’t simply do work directly on master or work on a topic branch and merge it in when we think it’s done - we try to get signoff from someone else in the company. This is generally a +1 or emoji or “:shipit:” comment, but we try to get someone else to look at it.

Once we get that, and the branch passes CI, we can merge it into master for deployment, which will automatically close the Pull Request when we push it.

#6 - deploy immediately after review

Finally, your work is done and merged into the master branch. This means that even if you don’t deploy it now, people will base new work off of it and the next deploy, which will likely happen in a few hours, will push it out. So since you really don’t want someone else to push something that you wrote that breaks things, people tend to make sure that it really is stable when it’s merged and people also tend to push their own changes.

Our campfire bot, hubot, can do deployments for any of the employees, so a simple:

hubot depoy github to production

will deploy the code and zero-downtime restart all the necessary processes. You can see how common this is at GitHub:

You can see 6 different people (including a support guy and a designer) deploying about 24 times in one day.

I have done this for branches with one commit containing a one line change. The process is simple, straightforward, scalable and powerful. You can do it with feature branches with 50 commits on them that took 2 weeks, or 1 commit that took 10 minutes. It is such a simple and frictionless process that you are not annoyed that you have to do it even for 1 commit, which means people rarely try to skip or bypass the process unless the change is so small or insignificant that it just doesn’t matter.

This is an incredibly simple and powerful process. I think most people would agree that GitHub has a very stable platform, that issues are addressed quickly if they ever come up at all, and that new features are introduced at a rapid pace. There is no compromise of quality or stability so that we can get more speed or simplicity or less process.

Conclusion

Git itself is fairly complex to understand, making the workflow that you use with it more complex than necessary is simply adding more mental overhead to everybody’s day. I would always advocate using the simplest possible system that will work for your team and doing so until it doesn’t work anymore and then adding complexity only as absolutely needed.

For teams that have to do formal releases on a longer term interval (a few weeks to a few months between releases), and be able to do hot-fixes and maintenance branches and other things that arise from shipping so infrequently, git-flow makes sense and I would highly advocate it’s use.

For teams that have set up a culture of shipping, who push to production every day, who are constantly testing and deploying, I would advocate picking something simpler like GitHub Flow.

Reset Demystified

2011-07-11T00:00:00+00:00

Thanks for visiting!

This blog post content has been incorporated into the Pro Git Book, Chapter 7.7.

Please check it out there, as it's more up to date and has pictures and stuff.

Note to Self

2010-08-25T00:00:00+00:00

One of the cool things about Git is that it has strong cryptographic integrity. If you change any bit in the commit data or any of the files it keeps, all the checksums change, including the commit SHA and every commit SHA since that one. However, that means that in order to amend the commit in any way, for instance to add some comments on something or even sign off on a commit, you have to change the SHA of the commit itself.

Wouldn’t it be nice if you could add data to a commit without changing its SHA? If only there existed an external mechanism to attach data to a commit without modifying the commit message itself. Happy day! It turns out there exists just such a feature in newer versions of Git! As we can see from the Git 1.6.6 release notes where this new functionality was first introduced:

* "git notes" command to annotate existing commits.

Need any more be said? Well, maybe. How do you use it? What does it do? How can it be useful? I’m not sure I can answer all of these questions, but let’s give it a try. First of all, how does one use it?

Well, to add a note to a specific commit, you only need to run git notes add [commit], like this:

$ git notes add HEAD

This will open up your editor to write your commit message. You can also use the -m option to provide the note right on the command line:

$ git notes add -m 'I approve - Scott' master~1

That will add a note to the first parent on the last commit on the master branch. Now, how to view these notes? The easiest way is with the git log command.

$ git log master
commit 0385bcc3bc66d1b1ec07346c237061574335c3b8
Author: Ryan Tomayko <rtomayko@gmail.com>
Date:   Tue Jun 22 20:09:32 2010 -0700

  yield to run block right before accepting connections

commit 06ca03a20bb01203e2d6b8996e365f46cb6d59bd
Author: Ryan Tomayko <rtomayko@gmail.com>
Date:   Wed May 12 06:47:15 2010 -0700

  no need to delete these header names now

Notes:
  I approve - Scott

You can see the notes appended automatically in the log output. You can only have one note per commit in a namespace though (I will explain namespaces in the next section), so if you want to add a note to that commit, you have to instead edit the existing one. You can either do this by running:

$ git notes edit master~1

Which will open a text editor with the existing note so you can edit it:

I approve - Scott

#
# Write/edit the notes for the following object:
#
# commit 06ca03a20bb01203e2d6b8996e365f46cb6d59bd
# Author: Ryan Tomayko <rtomayko@gmail.com>
# Date:   Wed May 12 06:47:15 2010 -0700
# 
#     no need to delete these header names now
# 
#  kidgloves.rb |    2 --
#  1 files changed, 0 insertions(+), 2 deletions(-)
~                                                                               
~                                                                               
~                                                                               
".git/NOTES_EDITMSG" 13L, 338C

Sort of weird, but it works. If you just want to add something to the end of an existing note, you can run git notes append SHA, but only in newer versions of Git (I think 1.7.1 and above).

Notes Namespaces

Since you can only have one note per commit, Git allows you to have multiple namespaces for your notes. The default namespace is called ‘commits’, but you can change that. Let’s say we’re using the ‘commits’ notes namespace to store general comments but we want to also store bugzilla information for our commits. We can also have a ‘bugzilla’ namespace. Here is how we would add a bug number to a commit under the bugzilla namespace:

$ git notes --ref=bugzilla add -m 'bug #15' 0385bcc3

However, now you have to tell Git to specifically look in that namespace:

$ git log --show-notes=bugzilla
commit 0385bcc3bc66d1b1ec07346c237061574335c3b8
Author: Ryan Tomayko <rtomayko@gmail.com>
Date:   Tue Jun 22 20:09:32 2010 -0700

  yield to run block right before accepting connections

Notes (bugzilla):
  bug #15

commit 06ca03a20bb01203e2d6b8996e365f46cb6d59bd
Author: Ryan Tomayko <rtomayko@gmail.com>
Date:   Wed May 12 06:47:15 2010 -0700

  no need to delete these header names now

Notes:
  I approve - Scott

Notice that it also will show your normal notes. You can actually have it show notes from all your namespaces by running git log --show-notes=* - if you have a lot of them, you may want to just alias that. Here is what your log output might look like if you have a number of notes namespaces:

$ git log -1 --show-notes=*
commit 0385bcc3bc66d1b1ec07346c237061574335c3b8
Author: Ryan Tomayko <rtomayko@gmail.com>
Date:   Tue Jun 22 20:09:32 2010 -0700

    yield to run block right before accepting connections

Notes:
    I approve of this, too - Scott

Notes (bugzilla):
    bug #15

Notes (build):
    build successful (8/13/10)

You can also switch the current namespace you’re using so that the default for writing and showing notes is not ‘commits’ but, say, ‘bugzilla’ instead. If you export the variable GIT_NOTES_REF to point to something different, then the --ref and --show-notes options are not neccesary. For example:

$ export GIT_NOTES_REF=refs/notes/bugzilla

That will set your default to ‘bugzilla’ instead. It has to start with the ‘refs/notes/’ though.

Now, here is where the general usability of this really breaks down. I am hoping that this will be improved in the future and I put off writing this post because of my concern with this phase of the process, but I figured it has interesting enough functionality as-is that someone might want to play with it.

So, the notes (as you may have noticed in the previous section) are stored as references, just like branches and tags. This means you can push them to a server. However, Git has a bit of magic built in to expand a branch name like ‘master’ to what it really is, which is ‘refs/heads/master’. Unfortunately, Git has no such magic built in for notes. So, to push your notes to a server, you cannot simply run something like git push origin bugzilla. Git will do this:

$ git push origin bugzilla
error: src refspec bugzilla does not match any.
error: failed to push some refs to 'git@github.com:schacon/kidgloves.git'

However, you can push anything under ‘refs/’ to a server, you just need to be more explicit about it. If you run this it will work fine:

$ git push origin refs/notes/bugzilla
Counting objects: 3, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 263 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
To git@github.com:schacon/kidgloves.git
 * [new branch]      refs/notes/bugzilla -> refs/notes/bugzilla

In fact, you may want to just make that git push origin refs/notes/* which will push all your notes. This is what Git does normally for something like tags. When you run git push origin --tags it basically expands to git push origin refs/tags/*.

Getting Notes

Unfortunately, getting notes is even more difficult. Not only is there no git fetch --notes or something, you have to specify both sides of the refspec (as far as I can tell).

$ git fetch origin refs/notes/*:refs/notes/*
remote: Counting objects: 12, done.
remote: Compressing objects: 100% (8/8), done.
remote: Total 12 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (12/12), done.
From github.com:schacon/kidgloves
 * [new branch]      refs/notes/bugzilla -> refs/notes/bugzilla

That is basically the only way to get them into your repository from the server. Yay. If you want to, you can setup your Git config file to automatically pull them down though. If you look at your .git/config file you should have a section that looks like this:

[remote "origin"]
  fetch = +refs/heads/*:refs/remotes/origin/*
  url = git@github.com:schacon/kidgloves.git

The ‘fetch’ line is the refspec of what Git will try to do if you run just git fetch origin. It contains the magic formula of what Git will fetch and store local references to. For instance, in this case it will take every branch on the server and give you a local branch under ‘remotes/origin/’ so you can reference the ‘master’ branch on the server as ‘remotes/origin/master’ or just ‘origin/master’ (it will look under ‘remotes’ when it’s trying to figure out what you’re doing). If you change that line to fetch = +refs/heads/*:refs/remotes/manamana/* then even though your remote is named ‘origin’, the master branch from your ‘origin’ server will be under ‘manamana/master’.

Anyhow, you can use this to make your notes fetching easier. If you add multiple fetch lines, it will do them all. So in addition to the current fetch line, you can add a line that looks like this:

  fetch = +refs/notes/*:refs/notes/*

Which says also get all the notes references on the server and store them as though they were local notes. Or you can namespace them if you want, but that can cause issues when you try to push them back again.

Collaborating on Notes

Now, this is where the main problem is. Merging notes is super difficult. This means that if you pull down someone’s notes, you edit any note in a namespace locally and the other developer edits any note in that same namespace, you’re going to have a hard time getting them back in sync. When the second person tries to push their notes it will look like a non-fast-forward just like a normal branch update, but unline a normal branch you can’t just run git pull and then try again. You have to check out your notes ref as if it were a normal branch, which will look ridiculously confusing and then do the merge and then switch back. It is do-able, but probably not something you really want to do.

Because of this, it’s probably best to namespace your notes or better just have an automated process create them (like build statuses or bugzilla artifacts). If only one entity is updating your notes, you won’t have merge issues. However, if you want to use them to comment on commits within a team, it is going to be a bit painful.

So far, I’ve heard of people using them to have their ticketing system attach metadata automatically or have a system attach associated mailing list emails to commits they concern. Other people just use them entirely locally without pushing them anywhere to store reminders for themselves and whatnot.
Probably a good start, but the ambitious among you may come up with something else interesting to do. Let me know!

Pro Git 简体中文版

2010-06-09T00:00:00+00:00

The amazing Chunzi, in addition to translating and helping to coordinate the translation of the Chinese version of Pro Git, has just sent me a draft epub of the book in Chinese.

As he says:

“下载地址：http://cl.ly/da7a450319adfac01108

此书基本完成翻译，但仍有大半尚未审阅。今心血来潮，做了 epub 版本，可在 iPad 上阅读。生成代码在 http://github.com/chunzi/progit/tree/master/epub-zh。

既然是开源图书，所以优先级比较低。后续更新时再作新版。”

I have also uploaded the file to S3 in addition to his cl.ly link. If you want Pro Git in Chinese on your iPad, you can download it here.

祝你好运！

Pro Git Kindle Version Available

2010-06-06T00:00:00+00:00

When Pro Git was first released, I asked about being able to get it on my Kindle. In fact, one of the very first people to read the book was @adelcambre with a mobi file I generated myself. It was horrible looking, because I didn’t do it very well, but it did work. My editor at Apress wanted to get a professional Kindle version produced, but wasn’t sure if it was going to get done anytime soon.

Well, I just randomly found out that it did in fact finally get done. About 9 months after it was first published, I saw on my Amazon referal account that I had sold a Kindle version, which confused me. However, I went to Amazon and there it was:

I assume it looks much, much better than the version I originally did for Andy. (I should probably get myself a copy). So, if you feel like reading it on the Kindle, have at it. There have also been a number of people who have asked me how they can support the book without getting a dead-tree version, so here’s your chance. You can even get a Kindle reader for your computer, so you can search through it and whatnot. Enjoy!

My New Blog

2010-05-25T00:00:00+00:00

My previous blog, which moved from host to host, domain to domain over the years has built up a lot of cruft. I believe I went from a Ruby blog to WordPress to a custom blog engine back to WordPress to Jekyll and it’s a mess. Plus, a lot of the stuff I used to blog about is not really relevant to me anymore and the domain name I did everything under, “jointheconversation.org”, was too long and not as relevant to what I’m doing as it used to be.

So, I’ve decided to declare blog bankruptcy - blogruptcy - and start over again with a new site design (this time done by me, as you can probably tell) and new content. I pulled over a few of the last entries of my previous blog but for the most part I’m starting over.

So there.

Git Loves the Environment

2010-04-11T00:00:00+00:00

One of the things that people that come from the Subversion world tend to find pretty cool about Git is that there is no .svn directory in every subdirectory of your project, but instead just one .git directory in the root of your project. Actually, it’s even better than that. The .git directory does not even need to actually be within your project. Git allows you to tell it where your .git directory is, and there are a couple of ways to do that.

Let’s say you have your project and want to move the .git directory somewhere else. First let’s see what happens when we move our .git directory without telling Git.

$ git log --oneline
6e948ec my second commit
fda8c93 my initial commit
$ mv .git /opt/myproject.git
$ git log
fatal: Not a git repository (or any of the parent directories): .git

Well, since Git can’t find a .git direcotry, it appears that you are simply in a directory that is not controlled by Git. However, it’s pretty easy to tell Git to look elsewhere by providing the --git-dir option to any Git call:

$ git --git-dir=/opt/myproject.git log --oneline
6e948ec my second commit
fda8c93 my initial commit

However, you probably don’t want to do that for every Git call, as that is a lot of typing. You could create a shell alias, but you can also export an environment variable called GIT_DIR.

$ export GIT_DIR=/opt/myproject.git
$ git log --oneline
6e948ec my second commit
fda8c93 my initial commit

There are a number of ways to customize Git functionality via specific environment variables. You can also tell Git where your working directory is with GIT_WORK_TREE, so you can run the Git commands from any directory you are in, not just the current working directory. To see this, first we’ll change a file and then change directories and run git status.

$ echo 'test' >> README 
$ git status --short
M README

OK, but now if we change working directories, we’ll get weird output.

$ cd /tmp
$ git status --short
 D README
 ?? .ksda.1F5/
 ?? aprKhGx02
 ?? qlm.log
 ?? qlmlog.txt
 ?? smsi02122

Now Git is comparing your last commit to what is in your current working directory. However, you can tell it where your real Git working directory is without being in it, either with the --work-tree option or by exporting the GIT_WORK_TREE variable:

$ git --work-tree=/tmp/myproject status --short
 M README
$ export GIT_WORK_TREE=/tmp/myproject
$ git status --short
 M README

Now you’re doing operations on a Git repository outside of your working directory, which you’re not even in.

The last interesting variable you can set is your staging area. That is normally in the .git/index file, but again, you can set it somewhere else, so that you can have multiple staging areas that you can switch between if you want.

$ export GIT_INDEX_FILE=/tmp/index1
$ git add README
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
# modified:   README
#

Now we have the README file changed staged in the new index file. If we switch back to our original index, we can see that the file is no longer staged:

$ export GIT_INDEX_FILE=/opt/myproject.git/index
$ git status
# On branch master
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
# modified:   README
#
no changes added to commit (use "git add" and/or "git commit -a")

This is not quite as useful in day to day work, but it is pretty cool for building arbitrary trees and whatnot. We’ll explore how to use that to do neat things in a future post when we talk more about some of the lower level Git plumbing commands.

Replace Kicker

2010-03-17T00:00:00+00:00

In another of my series of “seemingly hidden Git features”, I would like to introduce git replace. Now, documentation exists for git replace, but it is rather unclear on what you would actually use it for (or even what it really does), so let’s go through a couple of examples as it really is quite powerful.

The replace command basically will take one object in your Git database and for most purposes replace it with another object. This is most commonly useful for replacing one commit in your history with another one.

For example, say you want to split your history into one short history for new developers and one much longer and larger history for people interested in data mining. You can graft one history onto the other by replaceing the earliest commit in the new line with the latest commit on the older one. This is nice because it means that you don’t actually have to rewrite every commit in the new history, as you would normally have to do to join them together (because the parentage effects the SHAs).

Let’s try this out. Let’s take an existing repository, split it into two repositories, one recent and one historical, and then we’ll see how we can recombine them without modifying the recent repositories SHA values via replace.

We’ll use a simple repository with five simple commits:

$ git log --oneline
ef989d8 fifth commit
c6e1e95 fourth commit
9c68fdc third commit
945704c second commit
c1822cf first commit

We want to break this up into two lines of history. One line goes from commit one to commit four - that will be the historical one. The second line will just be commits four and five - that will be the recent history.

Well, creating the historical history is easy, we can just put a branch in the history and then push that branch to the master branch of a new remote repository.

$ git branch history c6e1e95
$ git log --oneline --decorate
ef989d8 (HEAD, master) fifth commit
c6e1e95 (history) fourth commit
9c68fdc third commit
945704c second commit
c1822cf first commit

Now we can push the new history branch to the master branch of our new repository:

$ git remote add history git@github.com:schacon/project-history.git
$ git push history history:master
Counting objects: 12, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (12/12), 907 bytes, done.
Total 12 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (12/12), done.
To git@github.com:schacon/project-history.git
 * [new branch]      history -> master

OK, so our history is published. Now the harder part is truncating our current history down so it’s smaller. We need an overlap so we can replace a commit in one with an equivalent commit in the other, so we’re going to truncate this to just commits four and five (so commit four overlaps).

$ git log --oneline --decorate
ef989d8 (HEAD, master) fifth commit
c6e1e95 (history) fourth commit
9c68fdc third commit
945704c second commit
c1822cf first commit

I think it’s useful in this case to create a base commit that has instructions on how to expand the history, so other developers know what to do if they hit the first commit in the truncated history and need more. So, what we’re going to do is create an initial commit object as our base point with instructions, then rebase the remaining commits (four and five) on top of it. To do that, we need to choose a point to split at, which for us is the third commit, which is 9c68fdc in SHA-speak. So, our base commit will be based off of that tree. We can create our base commit using the commit-tree command, which just takes a tree and will give us a brand new, parentless commit object SHA back.

$ echo 'get history from blah blah blah' | git commit-tree 9c68fdc^{tree}
622e88e9cbfbacfb75b5279245b9fb38dfea10cf

OK, so now that we have a base commit, we can rebase the rest of our history on top of that with git rebase --onto. The --onto argument will be the SHA we just got back from commit-tree and the rebase point will be the third commit (9c68fdc again):

$ git rebase --onto 622e88 9c68fdc
First, rewinding head to replay your work on top of it...
Applying: fourth commit
Applying: fifth commit

OK, so now we’ve re-written our recent history on top of a throw away base commit that now has instructions in it on how to reconstitute the entire history if we wanted to. Now let’s see what those instructions would be (here is where replace finally comes into play).

So to get the history data after cloning this truncated repository, one would have to add a remote for the historical repository and fetch:

$ git remote add history git://github.com/schacon/project-history.git
$ git fetch history
From git://github.com/schacon/project-history.git
 * [new branch]      master     -> history/master

Now the collaborator would have their recent commits in the ‘master’ branch and the historical commits in the ‘history/master’ branch.

$ git log --oneline master
e146b5f fifth commit
81a708d fourth commit
622e88e get history from blah blah blah
$ git log --oneline history/master
c6e1e95 fourth commit
9c68fdc third commit
945704c second commit
c1822cf first commit

To combine them, you can simply call git replace with the commit you want to replace and then the commit you want to replace it with. So we want to replace the ‘fourth’ commit in the master branch with the ‘fourth’ commit in the ‘history/master’ branch:

$ git replace 81a708d c6e1e95

Now, if you look at the history of the master branch, it looks like this:

$ git log --oneline
e146b5f fifth commit
81a708d fourth commit
9c68fdc third commit
945704c second commit
c1822cf first commit

Cool, right? Without having to change all the SHAs upstream, we were able to replace one commit in our history with an entirely different commit and all the normal tools (bisect, blame, etc) will work how we would expect them to.

Interestingly, it still shows 81a708d as the SHA, even though it’s actually using the c6e1e95 commit data that we replaced it with. Even if you run a command like cat-file, it will show you the replaced data:

$ git cat-file -p 81a708d
tree 7bc544cf438903b65ca9104a1e30345eee6c083d
parent 9c68fdceee073230f19ebb8b5e7fc71b479c0252
author Scott Chacon <schacon@gmail.com> 1268712581 -0700
committer Scott Chacon <schacon@gmail.com> 1268712581 -0700

fourth commit

Remember that the actual parent of 81a708d was our placeholder commit (622e88e), not 9c68fdce as it states here.

The other cool thing is that this is kept in our references:

$ git for-each-ref
e146b5f14e79d4935160c0e83fb9ebe526b8da0d commit	refs/heads/master
c6e1e95051d41771a649f3145423f8809d1a74d4 commit	refs/remotes/history/master
e146b5f14e79d4935160c0e83fb9ebe526b8da0d commit	refs/remotes/origin/HEAD
e146b5f14e79d4935160c0e83fb9ebe526b8da0d commit	refs/remotes/origin/master
c6e1e95051d41771a649f3145423f8809d1a74d4 commit	refs/replace/81a708dd0e167a3f691541c7a6463343bc457040

This means that it’s easy to share our replacement with others, because we can push this to our server and other people can easily download it. This is not that helpful in the history grafting scenario we’ve gone over here (since everyone would be downloading both histories anyhow, so why seperate them?) but it can be useful in other circumstances. I’ll cover some other interesting scenarios in another post - I think this is probably enough to process for now.

Git's Little Bundle of Joy

2010-03-10T00:00:00+00:00

The scenario is thus: you need to sneakernet a git push. Maybe your network is down and you want to send changes to your co-workers. Perhaps you’re working somewhere onsite and don’t have access to the local network for security reasons. Maybe your wireless/ethernet card just broke. Maybe you don’t have access to a shared server for the moment, you want to email someone updates and you don’t want to transfer 40 commits via format-patch.

Enter git bundle. The bundle command will package up everything that would normally be pushed over the wire with a git push command into a binary file that you can email or sneakernet around, then unbundle into another repository.

Let’s see a simple example. Let’s say you have a repository with two commits:

$ git log
commit 9a466c572fe88b195efd356c3f2bbeccdb504102
Author: Scott Chacon <schacon@gmail.com>
Date:   Wed Mar 10 07:34:10 2010 -0800

    second commit

commit b1ec3248f39900d2a406049d762aa68e9641be25
Author: Scott Chacon <schacon@gmail.com>
Date:   Wed Mar 10 07:34:01 2010 -0800

    first commit

If you want to send that repository to someone and you don’t have access to a repository to push to, or simply don’t want to set one up, you can bundle it.

$ git bundle create repo.bundle master
Counting objects: 6, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (6/6), 441 bytes, done.
Total 6 (delta 0), reused 0 (delta 0)

Now you have a file named repo.bundle that has all the data needed to re-create the repository. You can email that to someone else, or put it on a USB drive and walk it over.

Now on the other side, say you are sent this repo.bundle file and want to work on the project.

$ git clone repo.bundle -b master repo
Initialized empty Git repository in /private/tmp/bundle/repo/.git/
$ cd repo
$ git log --oneline
9a466c5 second commit
b1ec324 first commit

I had to specify -b master because otherwise it couldn’t find the HEAD reference for some reason, but you may not need to do that. The point is, you have now cloned directly from a file, rather than from a remote server.

Now let’s say you do three commits on it and want to send the new commits back via a bundle on a usb stick or email.

$ git log --oneline
71b84da last commit - second repo
c99cf5b fourth commit - second repo
7011d3d third commit - second repo
9a466c5 second commit
b1ec324 first commit

First we need to determine the range of commits we want to include in the bundle. The easiest way would have been to drop a branch when we started, so we could say start_branch..master or master ^start_branch, but if we didn’t we can just list the starting SHA explicitly:

$ git log --oneline master ^9a466c5
71b84da last commit - second repo
c99cf5b fourth commit - second repo
7011d3d third commit - second repo

So we have the list of commits we want to include in the bundle, let’s bundle em up. We do that with the git bundle create command, giving it a filename we want our bundle to be and the range of commits we want to go into it.

$ git bundle create commits.bundle master ^9a466c5
Counting objects: 11, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (9/9), 775 bytes, done.
Total 9 (delta 0), reused 0 (delta 0)

Now we will have a commits.bundle file in our directory. If we take that and send it to our partner, she can then import it into the original repository, even if more work has been done there in the meantime.

When she gets the bundle, she can inspect it to see what it contains before she imports it into her repository. The first command is the bundle verify command that will make sure the file is actually a valid Git bundle and that you have all the neccesary ancestors to reconstitute it properly.

$ git bundle verify ../commits.bundle
The bundle contains 1 ref
71b84daaf49abed142a373b6e5c59a22dc6560dc refs/heads/master
The bundle requires these 1 ref
9a466c572fe88b195efd356c3f2bbeccdb504102 second commit
../commits.bundle is okay

If the bundler had created a bundle of just the last two commits they had done, rather than all three, the original repository would not be able to import it, since it is missing requisite history. The verify command would have looked like this instead:

$ git bundle verify ../commits-bad.bundle
error: Repository lacks these prerequisite commits:
error: 7011d3d8fc200abe0ad561c011c3852a4b7bbe95 third commit - second repo

However, our first bundle is valid, so we can fetch in commits from it. If you want to see what branches are in the bundle that can be imported, there is also a command to just list the heads:

$ git bundle list-heads ../commits.bundle
71b84daaf49abed142a373b6e5c59a22dc6560dc refs/heads/master

The verify sub-command will tell you the heads, too, as will a normal git ls-remote command, which you may have used for debugging before. The point is to see what can be pulled in, so you can use the fetch or pull commands to import commits from this bundle. Here we’ll fetch the ‘master’ branch of the bundle to a branch named ‘other-master’ in our repository:

$ git fetch ../commits.bundle master:other-master
From ../commits.bundle
 * [new branch]      master     -> other-master

Now we can see that we have the imported commits on the ‘other-master’ branch as well as any commits we’ve done in the meantime in our own ‘master’ branch.

$ git log --oneline --decorate --graph --all
* 8255d41 (HEAD, master) third commit - first repo
| * 71b84da (other-master) last commit - second repo
| * c99cf5b fourth commit - second repo
| * 7011d3d third commit - second repo
|/
* 9a466c5 second commit
* b1ec324 first commit

So, git bundle can be really useful for doing network-y, share-y operations when you don’t have the proper network or shared repository to do so.

Rerere Your Boat...

2010-03-08T00:00:00+00:00

One of the things I didn’t touch on at all in the book is the git rerere functionality. This also came up recently during one of my trainings, and I realize that a lot of people probably could use this, so I wanted to let you all now about it.

The git rerere functionality is a bit of a hidden feature (Git actually has a lot of cool hidden features, if you haven’t figured that out yet). The name stands for “reuse recorded resolution” and as the name implies, it allows you to ask Git to remember how you’ve resolved a hunk conflict so that the next time it sees the same conflict, Git can automatically resolve it for you.

There are a number of scenarios in which this functionality might be really handy. One of the examples that is mentioned in the documentation is if you want to make sure a long lived topic branch will merge cleanly but don’t want to have a bunch of intermediate merge commits. With rerere turned on you can merge occasionally, resolve the conflicts, then back out the merge. If you do this continuously, then the final merge should be easy because rerere can just do everything for you automatically.

This same tactic can be used if you want to keep a branch rebased so you don’t have to deal with the same rebasing conflicts each time you do it. Or if you want to take a branch that you merged and fixed a bunch of conflicts and then decide to rebase it instead - you likely won’t have to do all the same conflicts again.

The other situation I can think of is where you merge a bunch of evolving topic branches together into a testable head occasionally. If the tests fail, you can rewind the merges and re-do them without the topic branch that made the tests fail without having to re-resolve the conflicts again.

To enable the rerere functionality, you simply have to run this config setting:

$ git config --global rerere.enabled true

You can also turn it on by creating the .git/rr-cache directory in a specific repository, but I think the config setting is clearer, and it can be done globally.

Now let’s see a simple example. If we have a file that looks like this:

#! /usr/bin/env ruby

def hello
  puts 'hello world'
end

and in one branch we change the word ‘hello’ to ‘hola’, then in another branch we change the ‘world’ to ‘mundo’.

When we merge the two branches together, we’ll get a merge conflict:

$ git merge i18n-world
Auto-merging hello.rb
CONFLICT (content): Merge conflict in hello.rb
Recorded preimage for 'hello.rb'
Automatic merge failed; fix conflicts and then commit the result.

You should notice the new line Recorded preimage for FILE in there. Otherwise it should look exactly like a normal merge conflict. At this point, rerere can tell us some stuff. Normally, you might run git status at this point to see what all conflicted:

$ git status
# On branch master
# Unmerged paths:
#   (use "git reset HEAD <file>..." to unstage)
#   (use "git add <file>..." to mark resolution)
#
#	both modified:      hello.rb
#

However, git rerere will also tell you what it has recorded the pre-merge state for with git rerere status:

$ git rerere status
hello.rb

And git rerere diff will show the current state of the resolution - what you started with to resolve and what you’ve resolved it to.

$ git rerere diff
--- a/hello.rb
+++ b/hello.rb
@@ -1,11 +1,11 @@
 #! /usr/bin/env ruby

 def hello
-<<<<<<<
-  puts 'hello mundo'
-=======
+<<<<<<< HEAD
   puts 'hola world'
->>>>>>>
+=======
+  puts 'hello mundo'
+>>>>>>> i18n-world
 end

Also (and this isn’t really related to rerere), you can use ls-files -u to see the conflicted files and the before, left and right versions:

$ git ls-files -u
100644 39804c942a9c1f2c03dc7c5ebcd7f3e3a6b97519 1	hello.rb
100644 a440db6e8d1fd76ad438a49025a9ad9ce746f581 2	hello.rb
100644 54336ba847c3758ab604876419607e9443848474 3	hello.rb

Anyhow, so now you resolve it to just be “puts ‘hola mundo’” and you can run the rerere diff command again to see what rerere will remember:

$ git rerere diff
--- a/hello.rb
+++ b/hello.rb
@@ -1,11 +1,7 @@
 #! /usr/bin/env ruby

 def hello
-<<<<<<<
-  puts 'hello mundo'
-=======
-  puts 'hola world'
->>>>>>>
+  puts 'hola mundo'
 end

So that basically says, when I see a hunk conflict that has ‘hello mundo’ on one side and ‘hola world’ on the other, resolve it to ‘hola mundo’.

Now we can mark it as resolved and commit it:

$ git add hello.rb
$ git commit
Recorded resolution for 'hello.rb'.
[master 68e16e5] Merge branch 'i18n'

You can see that it “Recorded resolution for FILE”.

Now, let’s undo that merge and then rebase it on top of our master branch instead.

$ git reset --hard HEAD^
HEAD is now at ad63f15 i18n the hello

Our merge is undone. Now let’s rebase the topic branch.

$ git checkout i18n-world
Switched to branch 'i18n-world'
$ git rebase master
First, rewinding head to replay your work on top of it...
Applying: i18n one word
Using index info to reconstruct a base tree...
Falling back to patching base and 3-way merge...
Auto-merging hello.rb
CONFLICT (content): Merge conflict in hello.rb
Resolved 'hello.rb' using previous resolution.
Failed to merge in the changes.
Patch failed at 0001 i18n one word

Now, we got the same merge conflict like we expected, but check out the Resolved FILE using previous resolution line. If we look at the file, we’ll see that it’s already been resolved:

$ cat hello.rb
#! /usr/bin/env ruby

def hello
  puts 'hola mundo'
end

Also, git diff will show you how it was automatically re-resolved:

$ git diff
diff --cc hello.rb
index a440db6,54336ba..0000000
--- a/hello.rb
+++ b/hello.rb
@@@ -1,7 -1,7 +1,7 @@@
  #! /usr/bin/env ruby

  def hello
-   puts 'hola world'
 -  puts 'hello mundo'
++  puts 'hola mundo'
  end

You can also recreate the conflicted file state with the checkout command:

$ git checkout --conflict=merge hello.rb
$ cat hello.rb
#! /usr/bin/env ruby

def hello
<<<<<<< ours
  puts 'hola world'
=======
  puts 'hello mundo'
>>>>>>> theirs
end

That might be a new command to you as well, the --conflict option to git checkout. You can actually have checkout do a couple of things in this situation to help you resolve conflicts. Another interesting value for that option is ‘diff3’, which will give you left, right and common to help you resolve the conflict manually:

$ git checkout --conflict=diff3 hello.rb
$ cat hello.rb
#! /usr/bin/env ruby

def hello
<<<<<<< ours
  puts 'hola world'
|||||||
  puts 'hello world'
=======
  puts 'hello mundo'
>>>>>>> theirs
end

Anyhow, then you can re-resolve it by just running rerere again:

$ git rerere
Resolved 'hello.rb' using previous resolution.
$ cat hello.rb
#! /usr/bin/env ruby

def hello
  puts 'hola mundo'
end

Magical re-resolving! Then you can add and continue the rebase to complete it.

$ git add hello.rb
$ git rebase --continue
Applying: i18n one word

So, if you do a lot of re-merges, or want to keep a topic branch up to date with your master branch without a ton of merges, or you rebase often or any of the above, turn on rerere to help your life out a bit.

Smart HTTP Transport

2010-03-04T00:00:00+00:00

When I was done writing Pro Git, the only transfer protocols that existed were the git://, ssh:// and basic http:// transports. I wrote about the basic strengths and weaknesses of each in Chapter 4. At the time, one of the big differences between Git and most other VCS’s was that HTTP was not a mainly used protocol - that’s because it was read-only and very inefficient. Git would simply use the webserver to ask for individual objects and packfiles that it needed. It would even ask for big packfiles even if it only needed one object from it.

As of the release of version 1.6.6 at the end of last year, however, Git can now use the HTTP protocol just about as efficiently as the git or ssh versions (thanks to the amazing work by Shawn Pearce, who also happened to have been the technical editor of Pro Git). Amusingly, it has been given very little fanfare - the release notes for 1.6.6 state only this:

* "git fetch" over http learned a new mode that is different from the
  traditional "dumb commit walker".

Which is a huge understatement, given that I think this will become the standard Git protocol in the very near future. I believe this because it’s both efficient and can be run either secure and authenticated (https) or open and unauthenticated (http). It also has the huge advantage that most firewalls have those ports (80 and 443) open already and normal users don’t have to deal with ssh-keygen and the like. Once most clients have updated to at least v1.6.6, http will have a big place in the Git world.

What is "Smart" HTTP?

Before version 1.6.6, Git clients, when you clone or fetch over HTTP would basically just do a series of GETs to grab individual objects and packfiles on the server from bare Git repositories, since it knows the layout of the repo. This functionality is documented fairly completely in Chapter 9. Conversations over this protocol used to look like this:

$ git clone http://github.com/schacon/simplegit-progit.git
Initialized empty Git repository in /private/tmp/simplegit-progit/.git/
got ca82a6dff817ec66f44342007202690a93763949
walk ca82a6dff817ec66f44342007202690a93763949
got 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
Getting alternates list for http://github.com/schacon/simplegit-progit.git
Getting pack list for http://github.com/schacon/simplegit-progit.git
Getting index for pack 816a9b2334da9953e530f27bcac22082a9f5b835
Getting pack 816a9b2334da9953e530f27bcac22082a9f5b835
 which contains cfda3bf379e4f8dba8717dee55aab78aef7f4daf
walk 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
walk a11bef06a3f659402fe7563abf99ad00de2209e6

It is a completly passive server, and if the client needs one object in a packfile of thousands, the server cannot pull the single object out, the client is forced to request the entire packfile.

In contrast, the smarter protocols (git and ssh) would instead have a conversation with the git upload-pack process on the server which would determine the exact set of objects the client needs and build a custom packfile with just those objects and stream it over.

The new clients will now send a request with an extra GET parameter that older servers will simply ignore, but servers running the smart CGI will recognize and switch modes to a multi-POST mode that is similar to the conversation that happens over the git protocol. Once this series of POSTs is complete, the server knows what objects the client needs and can build a custom packfile and stream it back.

Furthermore, in the olden days if you wanted to push over http, you had to setup a DAV-based server, which was rather difficult and also pretty inefficient compared to the smarter protocols. Now you can push over this CGI, which again is very similar to the push mechanisms for the git and ssh protocols. You simply have to authenticate via an HTTP-based method, like basic auth or the like (assuming you don’t want your repository to be world-writable).

The rest of this article will explain setting up a server with the “smart”-http protocol, so you can test out this cool new feature. This feature is referred to as “smart” HTTP vs “dumb” HTTP because it requires having the Git binary installed on the server, where the previous incantation of HTTP transfer required only a simple webserver. It has a real conversation with the client, rather than just dumbly pushing out data.

Setting up Smart HTTP

So, Smart-HTTP is basically just enabling the new CGI script that is provided with Git called git-http-backend on the server. This CGI will read the path and headers sent by the revamped git fetch and git push binaries who have learned to communicate in a specific way with a smart server. If the CGI sees that the client is smart, it will communicate smartly with it, otherwise it will simply fall back to the dumb behavior (so it is backward compatible for reads with older clients).

To set it up, it’s best to walk through the instructions on the git-http-backend documentation page. Basically, you have to install Git v1.6.6 or higher on a server with an Apache 2.x webserver (it has to be Apache, currently - other CGI servers don’t work, last I checked). Then you add something similar to this to your http.conf file:

SetEnv GIT_PROJECT_ROOT /var/www/git
SetEnv GIT_HTTP_EXPORT_ALL
ScriptAlias /git/ /usr/libexec/git-core/git-http-backend/

Then you’ll want to make writes be authenticated somehow, possibly with an Auth block like this:

<LocationMatch "^/git/.*/git-receive-pack$">
        AuthType Basic
        AuthName "Git Access"
        Require group committers
        ...
</LocationMatch>

That is all that is really required to get this running. Now you have a smart http-based Git server that can do anonymous reads and authenticated writes with clients that have upgraded to 1.6.6 and above. How awesome is that? The documentation goes over more complex examples, like making it work with GitWeb and accelerating the dumb fallback reads, if you’re interested.

Rack-based Git Server

If you’re not a fan of Apache or you’re running some other web server, you may want to take a look at an app that I wrote called Grack, which is a Rack-based application for Smart-HTTP Git. Rack is a generic webserver interface for Ruby (similar to WSGI for Python) that has adapters for a ton of web servers. It basically replaces git http-backend for non-Apache servers that can’t run it.

This means that I can write the web handler independent of the web server and it will work with any web server that has a Rack handler. This currently means any FCGI server, Mongrel (and EventedMongrel and SwiftipliedMongrel), WEBrick, SCGI, LiteSpeed, Thin, Ebb, Phusion Passenger and Unicorn. Even cooler, using Warbler and JRuby, you can generate a WAR file that is deployable in any Java web application server (Tomcat, Glassfish, Websphere, JBoss, etc).

So, if you don’t use Apache and you are interested in a Smart-HTTP Git server, you may want to check out Grack. At GitHub, this is the adapter we’re using to eventually implement Smart-HTTP support for all the GitHub repositories. (It’s currently a tad bit behind, but I’ll be starting up on it again soon as I get it into production at GitHub - send pull requests if you find any issues)

Grack is about half as fast as the Apache version for simple ref-listing stuff, but we’re talking 10ths of a second. For most clones and pushes, the data transfer will be the main time-sink, so the load time of the app should be negligible.

In Conclusion

I think HTTP based Git will be a huge part of the future of Git, so if you’re running your own Git server, you should really check it out. If you’re not, GitHub and I’m sure other hosts will soon be supporting it - upgrade your Git client to 1.7ish soon so you can take advantage of it when it happens.

Undoing Merges

2010-03-02T00:00:00+00:00

I would like to start writing more here about general Git tips, tricks and upcoming features. There has actually been a lot of cool stuff that has happened since the book was first published, and a number of interesting things that I didn’t get around to covering in the book. I figure if I start blogging about the more interesting stuff, it should serve as a pretty handy guide should I ever start writing a second edition.

For the first such post, I’m going to cover a topic that was asked about at a training I did recently. The question was about a workflow where long running branches are merged occasionally, much like the Large Merging workflow that I describe in the book. They asked how to unmerge a branch, either permenantly or allowing you to merge it in later.

You can actually do this a number of ways. Let’s say you have history that looks something like this:

You have a couple of topic branches that you have developed and then integrated together by a series of merges. Now you want to revert something back in the history, say ‘C10’ in this case.

The first way to solve the problem could be to rewind ‘master’ back to C8 and then merge the remaining two lines back in again. This requires that anyone you’re collaborating with knows how to handle rewound heads, but if that’s not an issue, this is a perfectly viable solution. This is basically how the ‘pu’ branch is handled in the Git project itself.

$ git checkout master
$ git reset --hard [sha_of_C8]
$ git merge jk/post-checkout
$ git merge db/push-cleanup

Once you rewind and remerge, you’ll instead have a history that looks more like this:

Now you can go back and work on that newly unmerged line and merge it again at a later point, or perhaps ignore it entirely.

Reverting a Merge

However, what if you didn’t find this out until later, or perhaps you or one of your collaborators have done work after this merge series? What if your history looks more like this:

Now you either have to revert one of the merges, or go back, remerge and then cherry-pick the remaining changes again (C9 and C10 in this case), which is confusing and difficult, especially if there are a lot of commits after those merges.

Well, it turns out that Git is actually pretty good at reverting an entire merge. Although you’ve probably only used the git revert command to revert a single commit (if you’ve used it at all), you can also use it to revert merge commits.

All you have to do is specify the merge commit you want to revert and the parent line you want to keep. Let’s say that we want to revert the merge of the jk/post-checkout line. We can do so like this:

$ git revert -m 1 [sha_of_C8]
Finished one revert.
[master 88edd6d] Revert "Merge branch 'jk/post-checkout'"
 1 files changed, 0 insertions(+), 2 deletions(-)

That will introduce a new commit that undoes the changes introduced by merging in the branch in the first place - sort of like a reverse cherry pick of all of the commits that were unique to that branch. Pretty cool.

However, we’re not done.

Reverting the Revert

Let’s say now that you want to re-merge that work again. If you try to merge it again, Git will see that the commits on that branch are in the history and will assume that you are mistakenly trying to merge something you already have.

$ git merge jk/post-checkout
Already up-to-date.

Oops - it did nothing at all. Even more confusing is if you went back and committed on that branch and then tried to merge it in, it would only introduce the changes since you originally merged.

Gah. Now that’s really a strange state and is likely to cause a bunch of conflicts or confusing errors. What you want to do instead is revert the revert of the merge:

$ git revert 88edd6d
Finished one revert.
[master 268e243] Revert "Revert "Merge branch 'jk/post-checkout'""
 1 files changed, 2 insertions(+), 0 deletions(-)

Cool, so now we’ve basically reintroduced everything that was in the branch that we had reverted out before. Now if we have more work on that branch in the meantime, we can just re-merge it.

$ git merge jk/post-checkout
Auto-merging test.txt
Merge made by recursive.
 test.txt |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

So, I hope that’s helpful. This can be particularly useful if you have a merge-heavy development process. In fact, if you work mostly in topic branches before merging for intergration purposes, you may want to use the git merge --no-ff option so that the first merge is not a fast forward and can be reverted out in this manner.

Until next time.

This Year

2009-12-17T00:00:00+00:00

So I basically stopped blogging this year, more or less since I started working for GitHub. I’ve been busy, what can I say? I wrote a technical book through APress publishing called Pro Git, I’ve traveled all over the world speaking at dozens of conferences on Git and GitHub, and my wife had our first child, Josephine.

I think writing the book is what really killed the blogging. I was writing for hours every day or two, then spending hours each day the next week reading over comments telling me how I was both technically wrong and generally bad at writing in English. Sitting down and writing a blog post wasn’t really what I wanted to do after that.

Travel-wise it’s been a pretty amazing year. According to Tripit, I’ve traveled 74,360 mi to 31 cities in 11 countries over the course of 80 days this year. That’s almost a quarter of the year on the road. This year alone I’ve been to Shanghai, Tokyo, Amsterdam, Edinburgh, Berlin (twice), Poznan, Stockholm, Oslo, London (twice), Barcelona, Madrid, plus San Diego, Boston, Vegas, Chicago, New York, Virginia and Honolulu.

Physically I’ve changed kind of a lot, too. I got more or less in shape again this year, mostly due to a simple routine of running and the hundred pushups program. Though I haven’t quite yet made it all the way to 100 pushups in a row, I can now do about 70 or 80, which is pretty cool. I also lost about 30 pounds this year, from a high of about 195 when my baby was born to about 165 now.

Finally, there is Josephine. When the year started, Jessica was a couple months pregnant and now we have a little 5 month old baby girl. I was never really a fan of all the “it changes your whole life” comments, but it really does. It is by far the most amazing thing that has ever happened to me. She even has a little passport already and has traveled to Hawaii and Spain with us.

Overall, this has been one of the most amazing years of my life, and I’m really quite sad that I didn’t write more during it. I will definitely try to reverse that trend over the coming year as I have more amazing trips coming up (for example, Australia, New Zealand, Paris, Ireland just in the next few months), and there will be no shortage of baby stories and GitHub amazingness.

Looking forward to this year, everybody. Skál!

Translate This

2009-08-19T00:00:00+00:00

One of the things I love about Git is how easy it is to fork and contribute. More than once on GitHub I’ve put up some content and a handful of helpful souls go about translating it into another language. Well, it’s happened again with Pro Git. Since I released it under the Creative Commons license and put the code up on GitHub, several people have forked and started translating the book into other languages such as German, Chinese, Japanese and Russian.

I’ve been merging this work into my main repository to make it easier for people who want to help and now I’ve started publishing these translations on the main progit.org website. At the bottom of the page in the footer is now a list of ongoing translations (none of them are complete yet) that link to the translated content.

If you know another language and would like to make this Git resource available to others in that language, please either jump into helping with one of the ongoing translations or start a new one - I’ll begin publishing it at progit.org once it gets going.

The process to help with translations is to fork my main content repository, clone your new fork, make some changes and then send me a pull request through GitHub. Actually, even if you don’t send me a pull request, I tend to check out my network graph every few days and pull in things that look good.

There have so far been more than 30 people who have contributed translated material, errata or formatting fixes for my book since I put it up here - thanks to everyone! However, special thanks for all the help in translating the book so far to Mike Limansky (ru), Victor Portnov (ru), Egor Levichev (ru), Sven Fuchs (de), marcel (de), Fritz Thielemann (de), Yu Inao (ja), chunzi(zh) and DaNmarner (zh). Also huge thanks to Shane (duairc) for all the PDF creation software work. Keep it up!

The Gory Details

2009-07-28T00:00:00+00:00

First of all, thanks to everyone for spreading the word about this book and this site - I got way more attention on the first day live than I expected I might. More than 10,000 individuals visited the site in the first 24 hours of it being online for over 38k pageviews. We got on Hacker News, Reddit and Delicious Popular aggregators, not to mention the Twitterverse.

In the comments of the initial post here, a user asked me about the “gory details” of writing this book. Specifically about “what tools you used to create the book and its figures”. So here it is.

Somewhere else I read that someone liked that I used Markdown for writing the book, as you can download the Markdown source for the book at GitHub. Well, the entire writing process was unfortunately not done in Markdown. At Apress most of the editing and review process is still MS Word centric. Since I threw hissy fits at Word for the first few chapters the very nice people at Apress allowed me to write the remainder of the book in Markdown initially and the technical reviews were done via a Git repository on GitHub using the Markdown source.

So, for most of the book, the process was : I would write the first draft of each chapter in Markdown, two reviewers would add comments inline. Then, I would fix whatever they commented on then move the text into Word to submit it to the copy editor. The copy editor would review the Word document and let the technical editor have another pass, then I would fix up anything they commented on. Finally I get the chapter back in PDF form and I would do a final pass. Then I took that text and put it back in Markdown to generate HTML from for the website.

Fun, huh?

For the diagrams, I always use OmniGraffle. I personally think it’s one of the most amazing pieces of software ever created - I love it. I think normally an Apress designer would take whatever the author sketched and redo them, but in this case we actually just used the diagrams that I made. I just added the .graffle file to the GitHub repo if you’re interested in it.

Do What You Want

2009-02-19T00:00:00+00:00

My friend Chris Wanstrath gave the keynote talk yesterday at StartupRiot on his lessons learned in building a successful startup company. A comment on the Hacker News post about it summed the keynote up thusly:

dont be afraid to fail, there is lots to learn from it, follow what you genuinely love and use, and do what works for you best

I largely agree, and I like the talk. What I wanted to comment on was what I read on the same Hacker News post:

I know that’s a popular piece of advice given around here (though HN has an interesting mix of “stuffy” academia and “Wild West” hacker culture), and I felt the same way when I was younger. I thought I knew all the CS I needed to know to develop the stuff I’d want to develop. But I want to say that I am so overwhelmingly happy I have done CS at the college level. I knew so little back then (and still). It scares me that I almost neglected the path of inquiry that is /the/ single most interesting intellectual topic in my life.

That commenter was talking about a part at the end of Chris’s talk where he mentions a kid that was trying to decide if he should go to college to learn CS or just work at a startup. Chris’s advice is to :

Do whatever you want. Don’t take VC because Twitter did. Don’t piss off investors because signals likes to. Do whatever you want, whatever works best for you, what makes you the most comfortable and seems like the best idea. Do what you love.

I love the advice to not do what others did just because that is what is done, but I would probably advise that guy to go to college. Not because it’s the only way to get things done, but because it’s a great way to get a broader perspective. I’ve found that the better programmers I know are people that have interests in things other than programming. I discovered tons of things that I would not have normally been interested in but made me think differently because of classes I took in college.

GitHub Flow

Issues with git-flow

GitHub Flow

How We Do It

#1 - anything in the master branch is deployable

#2 - create descriptive branches off of master

#3 - push to named branches constantly

#4 - open a pull request at any time

#5 - merge only after pull request review

#6 - deploy immediately after review

Conclusion

Reset Demystified

Note to Self

Notes Namespaces

Sharing Notes

Getting Notes

Collaborating on Notes

Pro Git 简体中文版

Pro Git Kindle Version Available

My New Blog

Git Loves the Environment

Replace Kicker

Git's Little Bundle of Joy

Rerere Your Boat...

Smart HTTP Transport

What is "Smart" HTTP?

Setting up Smart HTTP

Rack-based Git Server

In Conclusion

Undoing Merges

Reverting a Merge

Reverting the Revert

This Year

Translate This

The Gory Details

Do What You Want