Resistance Against London Tube Map Commit History (a.k.a. Git Merge Hell)

It's so easy to end up with git commit history which looks like London tube map. Let's see how we end up with those big, ugly, meaningless commit histories and how to prevent having one.
2015-03-08 17:18
Tugberk Ugurlu


If you have ever been to London, you probably know how complicated London tube map looks like :)

2014-12-17 20.23.39

There is no way for me to understand this unless I look at it for 100 times (and it was actually the reality).

Most of the Git repository commit histories I look at nowadays are not much different from this map. For example, I cloned the original Git source code and wanted to look at its commit history. I wish I didn’t do that as my eyes hurt really bad:

Screenshot 2015-03-07 12.24.31

This is the view of gitk but it doesn’t matter what you use here. It looks as bad as the above one when you are just looking at the log:

image

Maybe I am being a little skeptical about it and maybe, it’s useful information there. Let’s look a little closer to see what it actually says:

image

90% of the commits above are rubbish to me. YES, I am talking about those meaningless merge commits. They have no really value, they are just implementation details that happened during development and it makes no sense when I look at them later. I previously blogged about how rebase can help taking away this pain. However, it’s hard to apply rebasing model if you really don’t know what you are doing. I wanted to dig a little deeper in this post and share my opinions on this controversial topic.

How We Are Ending up With This Mess

Let’s take a very simple example and work on that to simulate how we are ending up with a mess like this. I have two repositories: one is called london-tube-main (upstream) and the other one is london-tube-fork (origin). I only have one commit under upstream/master branch and I have one more additional commit under origin/doing-stuff branch as you can see below:

Picture6

All seems good so far. I have the stable code inside upstream/master and I am working on a new stuff under origin/doing-stuff. Now, let’s make a new commit to upstream/master which we still continue cracking on origin/doing-stuff.

Picture5

At that point, the origin/doing-stuff branch is out of sync. The logical option here is to sync with upstream/master before continue adding new stuff. In order to do that, I ran git merge upstream/master when I was under origin/doing-stuff. If I look at what happened there after running it, I would see something like this:

image

Git merge manual explains what actually happened here really well and I copied the description by changing only the references:

Then "git merge upstream/master" will replay the changes made on the upstream/master branch since it diverged from master (i.e., 1) until its current commit (3) on top of master, and record the result in a new commit along with the names of the two parent commits and a log message from the user describing the changes.

Our simple progress flow would look like this now:

Picture7

At this state, I am in a feature branch working on my stuff. I now recorded a commit practically saying "I synced my branch, yay me!" which is pointless when you want to get this work into your stable branch. Let’s add one more commit on origin/doing-stuff. At the end, we have the following look:

Picture8

Except from the unnecessary merge commit, it is not that bad but this can get worse. Right now if you want to merge origin/doing-stuff branch into upstream/master, it will be a fast-forward merge which means only updating the branch pointer, without creating a merge commit. However, some prefers to disable this behavior with --no-ff switch for the merge command which makes it possible to create a merge commit even when the merge resolves as a fast-forward. It makes the history look even worse by doing this.

image

Repeat this process over and over again, you will be really close to your own version of London tube map™!

How Can We Get Rid off This Mess

Let’s go back to one of our earlier states on our example:

Picture5

To refresh our memories, we now want to keep working on origin/doing-stuff branch but we are out of sync with the upstream/master branch. Previously, we blindly merged upstream/master into origin/doing-stuff branch but this time will rebase origin/doing-stuff onto upstream/master:

image

We are now in sync with upstream/master but what happened here is actually really clever. When you are doing the rebase, git first takes your new commits and puts the upstream/master commits onto your branch. Later, it will  apply each of your commits one by one. If you don’t have any conflicts, it will be a successful rebase as it was in our case here.

Picture10

Notice that the 2 commit is now new-2 commit. In reality, commits are identified by SHA1 hash and when you do a rebase, hash of your new commits will be recalculated as the parent commit is now changed, which will result in a rewritten history inside your feature branch. If you try to push your changes to origin/doing-stuff branch now, it will fail as the history is changed:

image

You can however force your changes to be pushed with git push --force command which will practically replace your history with the remote one:

image

DANGER ALERT!

You should really avoid doing force pushes against a branch which you share with somebody else. You really don’t want to piss your mates off :)

It’s generally OK to force push into a feature branch inside your own fork which you are the only one who is working on it. Also, if you have an open pull request on GitHub attached to that branch, it will update the pull request when you force push which is nice. General rule of thumb here is that you should work on your own fork and shouldn’t push a feature branch into the shared remote upstream repository. This will generally make your life easier.

When you now add new commits and try to get these changes into upstream/master, it will just be a fast-forward merge (unless you have --no-ff merge rule in place).

image

Now, the history is so much cleaner:

image

Conclusion

If you are on a long term big project where more than one person is involved, using merging like this in an obnoxious way and it will make you and your team suffer. You will feel in the same way like you felt when you first landed in London and picked up the London tube map, which is confusing, terrified. I agree that applying rebasing is hard but spend some time on this to get it right throughout your team. Do not even hesitate in spending a day to practice the flow in order to get it right. It may seem not important but it actually really is when you need to dig into the history of the code (e.g. git bisect). A few simple rules that I follow which may also be helpful for you.

  • Do your dirty stuff inside your own fork (you can go crazy, no one will care).
  • Force --ff-only globally: git config branch.master.mergeoptions  "--ff-only"
  • When you are on a feature branch, always rebase or pull --rebase.
  • To make the history look more meaningful and modular, make use of git add -p and git rebase –i
  • Never rebase any shared branch onto your feature branch and force push to any shared branch. There is a really good chance of pissing your coworkers off by doing this :)


Comments

Andrea Angella
by Andrea Angella on Monday, Mar 09 2015 08:39:34 +00:00
Great post. My team is relatively new to git and we are still doing merges. I am already suggesting to use rebase but I need to learn it myself as well. Do you use a tool for it or just the command line?
Tugberk
by Tugberk on Monday, Mar 09 2015 09:21:41 +00:00
I don't think there is a tool for making the rebase and interactive rebase feel better than command line. I use command line for everything anyway :) I can also understand that it's a hard one to adopt at the team level.
Chris Marisic
by Chris Marisic on Monday, Mar 09 2015 17:30:26 +00:00
I'm an anti-rebase person. The risk of rebase is unacceptable. The common response to this "well do it right". Anything that depends on humans "doing it right" to not screw things up, is a ticking time bomb. When it is up to me, rebase is banned. If you rebase anyway, and screw it up, thereby screwing up the repository. I would likely fire you over it. This is why i am anti-git. Git prides it self on source code destructive actions, where Mercurial does everything possible to prevent you from destroying your source code. I find it funny that probably the most popular version control system is inherently source UNsafe.
Ciantic
by Ciantic on Saturday, Mar 21 2015 14:57:39 +00:00
Rebasing works in isolated occasions, but maybe there should be a way to suppress the "I synced my branch, yay me!" commits from the GUI.
Michal
by Michal on Tuesday, Apr 07 2015 08:02:44 +00:00
@Chris Marisic: your approach seems to be without reason, banning something "just because" in the end will be a lost argument. Obviously human error can and in the end will introduce issues yet both merging and rebasing require overseeing by humans especially when conflicts are recognised by source control of your choice. Yet in this case who will be more competent to resolve these issues than the person who wrote the conflicting code? If you refer only to rebase and force push I have to agree that this is dangerous even if the branch belongs only to a single developer. I am a proponent of a more "use the right tool for the job" approach and on projects I oversee we rebase a development branch for each small feature and branch+merge larger features. Released versions are just done as branches from a code-freeze points on the same development branch. Do you know that in git non fast forward commits can be disallowed? I left permission to do them only for some trusted individuals.
Alberto Chiesa
by Alberto Chiesa on Tuesday, Apr 07 2015 08:39:53 +00:00
I'm with Chris on this. Rebase is cool, but only if you know exactly what you're doing. If you're on a team of 3+ people, you should really care about rewriting history for other teammates. The article explains that you should rewrite history (thereby: lying on what you did) to present a more coincise and comprehensible version of the facts. This is not different than someone telling you a story and you asking them "keep it short and straight". The problem with a recap is that, sometimes, crucial details get lost. I prefer an exact, messy, version of the events than a clearly written lie, especially when, if you mess up the rewriting, you're going to lose track of the modifications, which should be THE task of the CVS. Just my 2c.
David V. Corbin
by David V. Corbin on Tuesday, Apr 07 2015 11:00:51 +00:00
There are many different project environments, the following applies to typical business scenarios... "even if the (feature) branch belongs only to a single developer".... This is the way of isolationist, lone wolf mentalities rather than collaborative teamwork. As soon as the work period exceeds a short time (a few hours in most cases, under an hour in some) then there is value in sharing it with the teammates. As a result, the approach taken of using rebase will be happening on branches that are indeed shared.
a@b.cn
by a@b.cn on Tuesday, Apr 07 2015 11:10:30 +00:00
To the anti-rebase person above, the risk is not because of the rebasing, but because your central git repo allows force push. I could reset --hard, and force push that, and your server apparently happily acecpts that. Put a hook in there that prevents throwing away work!
Rob Jellinghaus
by Rob Jellinghaus on Tuesday, Apr 07 2015 17:21:25 +00:00
The article is compellingly written, but after reading through the comments, I now think this is a UI problem. Vacuous merge commits should be able to be suppressed by the various commit history viewers. Don't destroy the history, which is accurate and sensible; instead, improve the visualizations to more selectively filter out all merge commits -- but with the ability to selectively show them again in cases where it matters. Are there any enhanced git history viewers that already do this?
steveC
by steveC on Tuesday, Apr 14 2015 00:00:46 +00:00
At the risk of sounding like a troll, this is just a weakness of Git that just "is"... Git is inherently decentralized, which does not remove the need for centralization, it just moves the task to the humans doing the development, which is why we need rebase. Making the system handle centralization has been done already. Subversion, TFS, Perforce, etc etc etc, but they are not new and shiny anymore, so here we are with best practices text on how to behave in a centralized manner with a decentralized platform. However, we live in the world we live in, and Git is it today. I appreciate the article. It is now in my mental bag of tricks. :) Thanks Tugberk!
rob
by rob on Friday, Jun 26 2015 21:23:37 +00:00
... Or you can simply use another DVCS or VCS (cvs, svn or alike, or maybe mercurial) not the best tool in the world but to work cooperatively in a branch they at least do not requier too much knowledge or training before to actually be productive (every intern we have spend a while before to simply figure out how the hell to simply checkin a line of code in git projects), and you actually understand what you are doing - which i believe is part of being less dangerous in checking in / pushing code
zapya
by zapya on Tuesday, Oct 25 2016 11:31:18 +00:00
Download Zapya for PC to share files and folders easily from PC to other devices

Tags