In another of my series of “seemingly hidden Git features”, I would like to
introduce git replace
. Now, documentation exists for git replace
, but it
is rather unclear on what you would actually use it for (or even what it really
does), so let’s go through a couple of examples as it really is quite powerful.
The replace
command basically will take one object in your Git database and
for most purposes replace it with another object. This is most commonly useful
for replacing one commit in your history with another one.
For example, say you want to split your history into one short history for
new developers and one much longer and larger history for people interested
in data mining. You can graft one history onto the other by replace
ing
the earliest commit in the new line with the latest commit on the older one. This
is nice because it means that you don’t actually have to rewrite every commit
in the new history, as you would normally have to do to join them together
(because the parentage effects the SHAs).
Let’s try this out. Let’s take an existing repository, split it into two
repositories, one recent and one historical, and then we’ll see how we can
recombine them without modifying the recent repositories SHA values via replace
.
We’ll use a simple repository with five simple commits:
$ git log --oneline
ef989d8 fifth commit
c6e1e95 fourth commit
9c68fdc third commit
945704c second commit
c1822cf first commit
We want to break this up into two lines of history. One line goes from commit one to commit four - that will be the historical one. The second line will just be commits four and five - that will be the recent history.
Well, creating the historical history is easy, we can just put a branch in the history and then push that branch to the master branch of a new remote repository.
$ git branch history c6e1e95
$ git log --oneline --decorate
ef989d8 (HEAD, master) fifth commit
c6e1e95 (history) fourth commit
9c68fdc third commit
945704c second commit
c1822cf first commit
Now we can push the new history branch to the master branch of our new repository:
$ git remote add history git@github.com:schacon/project-history.git
$ git push history history:master
Counting objects: 12, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (12/12), 907 bytes, done.
Total 12 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (12/12), done.
To git@github.com:schacon/project-history.git
* [new branch] history -> master
OK, so our history is published. Now the harder part is truncating our current history down so it’s smaller. We need an overlap so we can replace a commit in one with an equivalent commit in the other, so we’re going to truncate this to just commits four and five (so commit four overlaps).
$ git log --oneline --decorate
ef989d8 (HEAD, master) fifth commit
c6e1e95 (history) fourth commit
9c68fdc third commit
945704c second commit
c1822cf first commit
I think it’s useful in this case to create a base commit that has instructions
on how to expand the history, so other developers know what to do if they
hit the first commit in the truncated history and need more. So, what we’re
going to do is create an initial commit object as our base point with instructions,
then rebase the remaining commits (four and five) on top of it. To do that,
we need to choose a point to split at, which for us is the third commit, which
is 9c68fdc
in SHA-speak. So, our base commit will be based off of that tree.
We can create our base commit using the commit-tree
command, which just takes
a tree and will give us a brand new, parentless commit object SHA back.
$ echo 'get history from blah blah blah' | git commit-tree 9c68fdc^{tree}
622e88e9cbfbacfb75b5279245b9fb38dfea10cf
OK, so now that we have a base commit, we can rebase the rest of our history
on top of that with git rebase --onto
. The --onto
argument will be the
SHA we just got back from commit-tree
and the rebase point will be the third
commit (9c68fdc
again):
$ git rebase --onto 622e88 9c68fdc
First, rewinding head to replay your work on top of it...
Applying: fourth commit
Applying: fifth commit
OK, so now we’ve re-written our recent history on top of a throw away base commit
that now has instructions in it on how to reconstitute the entire history if
we wanted to. Now let’s see what those instructions would be (here is where
replace
finally comes into play).
So to get the history data after cloning this truncated repository, one would have to add a remote for the historical repository and fetch:
$ git remote add history git://github.com/schacon/project-history.git
$ git fetch history
From git://github.com/schacon/project-history.git
* [new branch] master -> history/master
Now the collaborator would have their recent commits in the ‘master’ branch and the historical commits in the ‘history/master’ branch.
$ git log --oneline master
e146b5f fifth commit
81a708d fourth commit
622e88e get history from blah blah blah
$ git log --oneline history/master
c6e1e95 fourth commit
9c68fdc third commit
945704c second commit
c1822cf first commit
To combine them, you can simply call git replace
with the commit you want
to replace and then the commit you want to replace it with. So we want to
replace the ‘fourth’ commit in the master branch with the ‘fourth’ commit in
the ‘history/master’ branch:
$ git replace 81a708d c6e1e95
Now, if you look at the history of the master
branch, it looks like this:
$ git log --oneline
e146b5f fifth commit
81a708d fourth commit
9c68fdc third commit
945704c second commit
c1822cf first commit
Cool, right? Without having to change all the SHAs upstream, we were able to
replace one commit in our history with an entirely different commit and all the
normal tools (bisect
, blame
, etc) will work how we would expect them to.
Interestingly, it still shows 81a708d
as the SHA, even though it’s actually
using the c6e1e95
commit data that we replaced it with. Even if you run a
command like cat-file
, it will show you the replaced data:
$ git cat-file -p 81a708d
tree 7bc544cf438903b65ca9104a1e30345eee6c083d
parent 9c68fdceee073230f19ebb8b5e7fc71b479c0252
author Scott Chacon <schacon@gmail.com> 1268712581 -0700
committer Scott Chacon <schacon@gmail.com> 1268712581 -0700
fourth commit
Remember that the actual parent of 81a708d
was our placeholder commit (622e88e
),
not 9c68fdce
as it states here.
The other cool thing is that this is kept in our references:
$ git for-each-ref
e146b5f14e79d4935160c0e83fb9ebe526b8da0d commit refs/heads/master
c6e1e95051d41771a649f3145423f8809d1a74d4 commit refs/remotes/history/master
e146b5f14e79d4935160c0e83fb9ebe526b8da0d commit refs/remotes/origin/HEAD
e146b5f14e79d4935160c0e83fb9ebe526b8da0d commit refs/remotes/origin/master
c6e1e95051d41771a649f3145423f8809d1a74d4 commit refs/replace/81a708dd0e167a3f691541c7a6463343bc457040
This means that it’s easy to share our replacement with others, because we can push this to our server and other people can easily download it. This is not that helpful in the history grafting scenario we’ve gone over here (since everyone would be downloading both histories anyhow, so why seperate them?) but it can be useful in other circumstances. I’ll cover some other interesting scenarios in another post - I think this is probably enough to process for now.