If you have ever been to London, you probably know how complicated London tube map looks like :)
There is no way for me to understand this unless I look at it for 100 times (and it was actually the reality).
Most of the Git repository commit histories I look at nowadays are not much different from this map. For example, I cloned the original Git source code and wanted to look at its commit history. I wish I didn’t do that as my eyes hurt really bad:
This is the view of gitk but it doesn’t matter what you use here. It looks as bad as the above one when you are just looking at the log:
Maybe I am being a little skeptical about it and maybe, it’s useful information there. Let’s look a little closer to see what it actually says:
90% of the commits above are rubbish to me. YES, I am talking about those meaningless merge commits. They have no really value, they are just implementation details that happened during development and it makes no sense when I look at them later. I previously blogged about how rebase can help taking away this pain. However, it’s hard to apply rebasing model if you really don’t know what you are doing. I wanted to dig a little deeper in this post and share my opinions on this controversial topic.
How We Are Ending up With This Mess
Let’s take a very simple example and work on that to simulate how we are ending up with a mess like this. I have two repositories: one is called london-tube-main (upstream) and the other one is london-tube-fork (origin). I only have one commit under upstream/master branch and I have one more additional commit under origin/doing-stuff branch as you can see below:
All seems good so far. I have the stable code inside upstream/master and I am working on a new stuff under origin/doing-stuff. Now, let’s make a new commit to upstream/master which we still continue cracking on origin/doing-stuff.
At that point, the origin/doing-stuff branch is out of sync. The logical option here is to sync with upstream/master before continue adding new stuff. In order to do that, I ran git merge upstream/master when I was under origin/doing-stuff. If I look at what happened there after running it, I would see something like this:
Git merge manual explains what actually happened here really well and I copied the description by changing only the references:
Then "git merge upstream/master" will replay the changes made on the upstream/master branch since it diverged from master (i.e., 1) until its current commit (3) on top of master, and record the result in a new commit along with the names of the two parent commits and a log message from the user describing the changes.
Our simple progress flow would look like this now:
At this state, I am in a feature branch working on my stuff. I now recorded a commit practically saying "I synced my branch, yay me!" which is pointless when you want to get this work into your stable branch. Let’s add one more commit on origin/doing-stuff. At the end, we have the following look:
Except from the unnecessary merge commit, it is not that bad but this can get worse. Right now if you want to merge origin/doing-stuff branch into upstream/master, it will be a fast-forward merge which means only updating the branch pointer, without creating a merge commit. However, some prefers to disable this behavior with --no-ff switch for the merge command which makes it possible to create a merge commit even when the merge resolves as a fast-forward. It makes the history look even worse by doing this.
Repeat this process over and over again, you will be really close to your own version of London tube map™!
How Can We Get Rid off This Mess
Let’s go back to one of our earlier states on our example:
To refresh our memories, we now want to keep working on origin/doing-stuff branch but we are out of sync with the upstream/master branch. Previously, we blindly merged upstream/master into origin/doing-stuff branch but this time will rebase origin/doing-stuff onto upstream/master:
We are now in sync with upstream/master but what happened here is actually really clever. When you are doing the rebase, git first takes your new commits and puts the upstream/master commits onto your branch. Later, it will apply each of your commits one by one. If you don’t have any conflicts, it will be a successful rebase as it was in our case here.
Notice that the 2 commit is now new-2 commit. In reality, commits are identified by SHA1 hash and when you do a rebase, hash of your new commits will be recalculated as the parent commit is now changed, which will result in a rewritten history inside your feature branch. If you try to push your changes to origin/doing-stuff branch now, it will fail as the history is changed:
You can however force your changes to be pushed with git push --force command which will practically replace your history with the remote one:
You should really avoid doing force pushes against a branch which you share with somebody else. You really don’t want to piss your mates off :)
It’s generally OK to force push into a feature branch inside your own fork which you are the only one who is working on it. Also, if you have an open pull request on GitHub attached to that branch, it will update the pull request when you force push which is nice. General rule of thumb here is that you should work on your own fork and shouldn’t push a feature branch into the shared remote upstream repository. This will generally make your life easier.
When you now add new commits and try to get these changes into upstream/master, it will just be a fast-forward merge (unless you have --no-ff merge rule in place).
Now, the history is so much cleaner:
If you are on a long term big project where more than one person is involved, using merging like this in an obnoxious way and it will make you and your team suffer. You will feel in the same way like you felt when you first landed in London and picked up the London tube map, which is confusing, terrified. I agree that applying rebasing is hard but spend some time on this to get it right throughout your team. Do not even hesitate in spending a day to practice the flow in order to get it right. It may seem not important but it actually really is when you need to dig into the history of the code (e.g. git bisect). A few simple rules that I follow which may also be helpful for you.
- Do your dirty stuff inside your own fork (you can go crazy, no one will care).
- Force --ff-only globally: git config branch.master.mergeoptions "--ff-only"
- When you are on a feature branch, always rebase or pull --rebase.
- To make the history look more meaningful and modular, make use of git add -p and git rebase –i
- Never rebase any shared branch onto your feature branch and force push to any shared branch. There is a really good chance of pissing your coworkers off by doing this :)