Deployment of the software has been a constant challenge possibly from the very start. It could be a web application, HTTP services, a Play Station app, an application running inside a Raspberry PI. All have the challenges of deploying new changes. It even makes you go down the route of a different architecture to make the deployments scale. One of the big challenges of software deployments is that there is not one generic rule or practice that you can apply and it will all be shinny. This doesn't mean that we don't have generic practices or techniques around it. However, the way your software lives in its destination(s) and the way it's being consumed are factors that you need to take into consideration before coming up with a deployment strategy.
There is a little bit of catch-22 here as you sometimes shape the architecture around the deployment strategy (not saying this is a good or bad way, not exactly sure yet). So, there is a fine balance you need to hit here to grab the robes of success.
So, what I am going to tell you here now might not seem to fit for everyone but I believe that it is the first step to make your deployment strategy a viable one. I will throw it here: your Git repository is your deployment boundary. This idea is so subtle, easy to grasp and easy to reason from. What this really means that all the things you have inside one repository will be part of only one deployment strategy. The main reason for this is versioning. Git allows you to tag your repositories and you use these feature to put you into different deployment pipelines based on you're the type of the change (e.g. minor, patch or major). See my previous post on versioning builds and releases through SemVer for a bit more detail on this. However, it will get messy if you try to maintain multiple different versions inside the same repository for different components. You will easily lose touch on how two components relate to each other over time.
Ask yourself this when you are structuring your repositories: are any of the components inside a repository has a different release cadence? If the question is yes, try to think of why and if the reasons are legitimate, give that components a new, separate home.
Let's start this post by setting the stage first and then move onto the problem. When a build is kicked off for your application/library/etc. on a CI (continuous integration) system like Travis CI or AppVeyor, you are most probably flowing a version number for that build no matter what type of tech stack you use. This is mostly to relate the artifacts, which the build will produce (e.g. Docker images, NuGet packages, .NET assemblies, etc.), with a particular context. This is really useful to be able to communicate and correlate stuff. A few scenarios:
- Hey Mark, please take a look at foobar-1.2.3-rc.657 from our CI Docker registry. That has the issue I have mentioned. You can check it on that image.
- Ow, barfoo-2.2.3-beta.362 NuGet package content misses a few assemblies that should have been there. Let's go back to build logs for this and check what went wrong.
Convinced? Good :) Otherwise, you won't find the rest of the article useful.
The other case is to flow a version number when you actually want to produce a release for your defined environments (e.g. acceptance, staging, production). In this case, you usually don't want to give an arbitrary version to your artifacts because the version will carry the high level information about the changes. There are three important intentions you can give here:
- I am releasing something which has no behavior changes
- I am releasing a new feature which doesn't break my existing consumers
- Dude, brace yourself! I will break the World into half!
You can see Semantic Versioning 2.0.0 for more information about this.
So, what happens here is that we want to let the CI system decide on the version at some cases and take control over which version number to flow in some other cases. Actually, the first statement is not quite correct because you still want to have partial control over what version number to flow for your non-release builds. Here is an example case to highlight what I mean:
- You started developing your application and shipped version 1.0.0.
- Your CI system started flowing prerelease version based on 1.0.0 and also attached the build number to that version (e.g. 1.0.0-beta.54). Notice that it's wrong at this stage because you already shipped v1.0.0. So, it should really be something like 1.0.1-beta-54.
- Now, you are shipping version 1.1.0 as you introduced a new feature.
- After that change, you keep building the software and CI system keeps flowing version 1.1.0 based versions. This is a bit bad as you now don't have the chronological order and version order correlation.
So, what we want here is to assign a version based on the latest release version, which means that you want to have control over this process of assigning a version number. I have seen people having a text file inside the repository to hold the latest release version but that's a bit manual. I assume you kick a release somewhere and you already assign a version at that stage for releases. So, wouldn't it be bad to leverage this?
So, you probably understood my problems here :) Now, let me introduce a few key pieces which will play a role to solve this problem and then later, I will move onto the actual implementation to solve the problem.
Tagging is a feature of Git which allows you to mark specific points in repository's history. As the Git manual also states, people typically use this functionality to mark release points. This is super convenient for our needs here and gets two important things sorted for us:
- A kick-off point for releases. Ultimately, release process will be kicked off when you tag a repository and push that tag to your remote.
- Deciding the base version based on the latest release version.
So, we have the tags. However, it doesn't mean that every tag is a valid version and you can also use Git's tagging feature for some other purposes. This is where SemVer comes into picture and you can safely assume that any tag which is a valid SemVer is for a release. This makes your life so much easier as you can rely on built-in tools like node-semver to help you out (as we will see shortly).
The other thing we have in the mix is to be able to increment the build version after a release. For example, we release version 2.5.6. The next build right after the release should have the version number bigger than 2.5.6. Seems easy as you can just increment the patch version, right? No! 2.5.6-beta is also a valid SemVer. We can go further with 2.5.6-beta.5+736287 which is also a valid SemVer. So, there is a pre-defined spec here and we can again leverage tools like node-semver to work with this domain nicely.
Solution and Bash Implementation
OK, all this information is super useful but how to make it work? Let me walk you through a solution I have introduced recently on a few of the projects I am working on. It's very trivial but that useful at the same time. However, keep in mind that there might be a few things I might have missed as I have been applying this not for a long time. In fact, here might even be better techniques on this that you know. If so, please comment here. I would love to hear them!
I want to example this in two stages and bring them together at the end.
Deciding on a Base Version
When the build is kicked off, one of the first things to do is to decide a base version. This is fairly trivial and here is the flow chart to describe this decision making process:
Here is how the implementation looks like in Bash:
#!/bin/bash baseVersion=0.0.0-0 if semver "ignorethis" $(git tag -l) &>/dev/null then baseVersion=$(semver $((semver $(git tag -l)) | tail -n1) -i prerelease) fi
Keep in mind that I am fairly new to Bash. So, there might be wrong/bad usages here.
To explain what happens here with a bit more details:
- We get all the tags for the repository as a list by running git tag -l
- We pass this list to semver command-line tool to filter the invalid SemVer strings. Notice that there is another parameter we pass to semver here called "ignorethis". It's just there to cover cases when there is no tag so that semver command-line tool can return non-zero exit code.
- If semver command-line tool exits with 0, we know that there is at least one tag which is a valid SemVer. So, we run tail -n1 on the semver output to retrieve the latest version and we increment it on its prerelease identifier. This is now our base version.
- If there are no valid SemVer tags on the repository, we set 0.0.0-0 as the base version.
Decide on a Build Version
Now we have a base version and we now need to decide on a build version based on that. This is a bit more involved but again, very trivial to implement. Here is another flow chart to describe this decision making process:
And, here is how the implementation looks like in Bash (specific to Travis CI as it uses Travis CI specific environment variables):
if [ -z "$TRAVIS_TAG" ]; then if [ -z "$TRAVIS_BRANCH" ]; then # can add the build metadata to indicate this is pull request build echo export PROJECT_BUILD_VERSION="$baseVersion.$TRAVIS_BUILD_NUMBER"; else # can add the build metadata to indicate this is a branch build echo export PROJECT_BUILD_VERSION="$baseVersion.$TRAVIS_BUILD_NUMBER"; fi else if ! semver $TRAVIS_TAG &>/dev/null then # can add the build metadata to indicate this is a tag build which is not a SemVer echo export PROJECT_BUILD_VERSION="$baseVersion.$TRAVIS_BUILD_NUMBER"; else echo export PROJECT_BUILD_VERSION=$(semver $TRAVIS_TAG); fi fi
Notice that I am echoing commands rather than directly calling them. This is because of a fact that Travis CI doesn't flow the exports which happens inside a script file. Maybe it does but I was not able to get it working. Anyways, I am calling this script inside my .travis.yml file by evaluating the output like this: eval $(./scripts/set-build-version.sh)
I am not going to separately explain how this works as the flow chart is very easy to grasp (also the Bash script). However, one thing which is worth mentioning is the branch check. After we check if the build is for a branch, we do the same operation no matter what. This is OK for my use case but you can add special metadata to your version in order to indicate which branch the build has happened or whether it was a pull request.
I find this solution very straight forward to pick the version of the build and have a central way of kicking of a release process. I applied this on AspNetCore.Identity.MongoDB project, a MongoDB data store adapter for ASP.NET Core identity. You can also see how I am setting the build version, how I am using it and how I am kicking off a release process.
To bring everything together, here is the entire script to set the build version:
#!/bin/bash baseVersion=0.0.0-0 if semver "ignorethis" $(git tag -l) &>/dev/null then baseVersion=$(semver $((semver $(git tag -l)) | tail -n1) -i prerelease) fi if [ -z "$TRAVIS_TAG" ]; then if [ -z "$TRAVIS_BRANCH" ]; then # can add the build metadata to indicate this is pull request build echo export PROJECT_BUILD_VERSION="$baseVersion.$TRAVIS_BUILD_NUMBER"; else # can add the build metadata to indicate this is a branch build echo export PROJECT_BUILD_VERSION="$baseVersion.$TRAVIS_BUILD_NUMBER"; fi else if ! semver $TRAVIS_TAG &>/dev/null then # can add the build metadata to indicate this is a tag build which is not a SemVer echo export PROJECT_BUILD_VERSION="$baseVersion.$TRAVIS_BUILD_NUMBER"; else echo export PROJECT_BUILD_VERSION=$(semver $TRAVIS_TAG); fi fi
I hope this will be useful to you in some way and as said, if you have a similar technique or a practice that you apply for this case, please share it. Now, go and enjoy this spectacular weekend ;)
If you have ever been to London, you probably know how complicated London tube map looks like :)
There is no way for me to understand this unless I look at it for 100 times (and it was actually the reality).
Most of the Git repository commit histories I look at nowadays are not much different from this map. For example, I cloned the original Git source code and wanted to look at its commit history. I wish I didn’t do that as my eyes hurt really bad:
This is the view of gitk but it doesn’t matter what you use here. It looks as bad as the above one when you are just looking at the log:
Maybe I am being a little skeptical about it and maybe, it’s useful information there. Let’s look a little closer to see what it actually says:
90% of the commits above are rubbish to me. YES, I am talking about those meaningless merge commits. They have no really value, they are just implementation details that happened during development and it makes no sense when I look at them later. I previously blogged about how rebase can help taking away this pain. However, it’s hard to apply rebasing model if you really don’t know what you are doing. I wanted to dig a little deeper in this post and share my opinions on this controversial topic.
How We Are Ending up With This Mess
Let’s take a very simple example and work on that to simulate how we are ending up with a mess like this. I have two repositories: one is called london-tube-main (upstream) and the other one is london-tube-fork (origin). I only have one commit under upstream/master branch and I have one more additional commit under origin/doing-stuff branch as you can see below:
All seems good so far. I have the stable code inside upstream/master and I am working on a new stuff under origin/doing-stuff. Now, let’s make a new commit to upstream/master which we still continue cracking on origin/doing-stuff.
At that point, the origin/doing-stuff branch is out of sync. The logical option here is to sync with upstream/master before continue adding new stuff. In order to do that, I ran git merge upstream/master when I was under origin/doing-stuff. If I look at what happened there after running it, I would see something like this:
Git merge manual explains what actually happened here really well and I copied the description by changing only the references:
Then "git merge upstream/master" will replay the changes made on the upstream/master branch since it diverged from master (i.e., 1) until its current commit (3) on top of master, and record the result in a new commit along with the names of the two parent commits and a log message from the user describing the changes.
Our simple progress flow would look like this now:
At this state, I am in a feature branch working on my stuff. I now recorded a commit practically saying "I synced my branch, yay me!" which is pointless when you want to get this work into your stable branch. Let’s add one more commit on origin/doing-stuff. At the end, we have the following look:
Except from the unnecessary merge commit, it is not that bad but this can get worse. Right now if you want to merge origin/doing-stuff branch into upstream/master, it will be a fast-forward merge which means only updating the branch pointer, without creating a merge commit. However, some prefers to disable this behavior with --no-ff switch for the merge command which makes it possible to create a merge commit even when the merge resolves as a fast-forward. It makes the history look even worse by doing this.
Repeat this process over and over again, you will be really close to your own version of London tube map™!
How Can We Get Rid off This Mess
Let’s go back to one of our earlier states on our example:
To refresh our memories, we now want to keep working on origin/doing-stuff branch but we are out of sync with the upstream/master branch. Previously, we blindly merged upstream/master into origin/doing-stuff branch but this time will rebase origin/doing-stuff onto upstream/master:
We are now in sync with upstream/master but what happened here is actually really clever. When you are doing the rebase, git first takes your new commits and puts the upstream/master commits onto your branch. Later, it will apply each of your commits one by one. If you don’t have any conflicts, it will be a successful rebase as it was in our case here.
Notice that the 2 commit is now new-2 commit. In reality, commits are identified by SHA1 hash and when you do a rebase, hash of your new commits will be recalculated as the parent commit is now changed, which will result in a rewritten history inside your feature branch. If you try to push your changes to origin/doing-stuff branch now, it will fail as the history is changed:
You can however force your changes to be pushed with git push --force command which will practically replace your history with the remote one:
You should really avoid doing force pushes against a branch which you share with somebody else. You really don’t want to piss your mates off :)
It’s generally OK to force push into a feature branch inside your own fork which you are the only one who is working on it. Also, if you have an open pull request on GitHub attached to that branch, it will update the pull request when you force push which is nice. General rule of thumb here is that you should work on your own fork and shouldn’t push a feature branch into the shared remote upstream repository. This will generally make your life easier.
When you now add new commits and try to get these changes into upstream/master, it will just be a fast-forward merge (unless you have --no-ff merge rule in place).
Now, the history is so much cleaner:
If you are on a long term big project where more than one person is involved, using merging like this in an obnoxious way and it will make you and your team suffer. You will feel in the same way like you felt when you first landed in London and picked up the London tube map, which is confusing, terrified. I agree that applying rebasing is hard but spend some time on this to get it right throughout your team. Do not even hesitate in spending a day to practice the flow in order to get it right. It may seem not important but it actually really is when you need to dig into the history of the code (e.g. git bisect). A few simple rules that I follow which may also be helpful for you.
- Do your dirty stuff inside your own fork (you can go crazy, no one will care).
- Force --ff-only globally: git config branch.master.mergeoptions "--ff-only"
- When you are on a feature branch, always rebase or pull --rebase.
- To make the history look more meaningful and modular, make use of git add -p and git rebase –i
- Never rebase any shared branch onto your feature branch and force push to any shared branch. There is a really good chance of pissing your coworkers off by doing this :)
You may wonder why the title starts with "Basics". The answer is simple: I know only the basics of git rebase :) It's only one of the powerful features of git and it allows you to have a clean history in a highly branching workflow. "Rebase" is quite powerful as mentioned and what I'm about to show you is only one of the reasons why to use rebase. I highly recommend Keith Dahlby's NDC talk which he took some time to show the rebase feature.
Let's see the easiest sample where rebase comes handy. We have the following history where we have two branches: master and feature-1.
Typically, what you would do here is to merge the feature-1 branch onto master which is fairly reasonable and it works. However, it creates you a unnecessary commit + a ridiculous graph which would be a mess if you think of hundreds of branches:
What you can do with rebase is to patch the feature-1 branch onto master. Later then, you can merge from there. The following command is what you need to run:
After running the rebase command, we can run "gitk –all" to see the graph:
It's now nice clean history. Notice that the master is still pointing where it was. It's because we haven't merge the feature-1 branch yet. Let's checkout to master branch and run "git merge feature-1" to merge feature-1 branch onto master branch:
Nicely done! Open up the gitk one more time and see the clean history:
After we remove the feature-1 branch by running "git branch –D feature-1", we won't have any trace from feature-1 branch which is absolutely OK as feature branches are just the implementation details, that's all.
Rebase can hurt
With git rebase, at the very basic level, you are messing with the history which can be dangerous depending on the case. On the other hand, when you have a collision, it's not a picnic to solve those collisions with interactive rebase without a deep firsthand knowledge but it's worth looking into even if it seems hard at the first glance
We are all in love with Git but without GitHub, we love Git less. On GitHub, we can maintain our projects very efficiently. Pull Request” and "Issues" features of GitHub are the key factors for that IMO. You can even send yourself a pull request from one branch to another and discuss that particular change with your team. As your discussion flows, your code can flow accordingly, too. This is just one of the many coolest features of GitHub.
There is a cool Git extension for GitHub which is maintained by one of the founders of GitHub: Chris Wanstrath. This cool extension named hub lets us work with GitHub more efficiently from the command line and perform GitHub specific operations easily like sending pull requests, forking repositories, etc. It’s fairly easy to install it on other platforms as far as I can see but it’s not that straight forward for Windows.
You should first go and install msysgit on Windows and I am assuming most of us using this on Windows for Git. Secondly, we should install Ruby on windows. You can install Ruby on windows through RubyInstaller easily.
After installing ruby on our machine successfully, we should add the bin path of Ruby to our system PATH variable. In order to do this, press Windows Key + PAUSE BREAK to open up the Windows System window and click "Advanced system settings" link on the left hand side of the window.
A new window should appear. From there, click "Environment Variables..." button to open up the Environment Variables window.
From there, you should see "System variables" section. Find the Path variable and concatenate the proper ruby bin path to that semicolon-separated list.
Last step is actually installing the hub. You should grab the standalone file and then rename it to "hub". Then, put it under the Git\bin folder. The full path of my Git\bin folder on my 64x machine is "C:\Program Files (x86)\Git\bin".
Now you should be able to run hub command from Git Bash:
Special GitHub commands you get through hub extension is nicely documented on the "Readme" file of the project. I think the coolest feature of hub is the pull-request feature. On GitHub, You can send pull requests to another repository through GitHub web site or GitHub API and hub extension uses GitHub API under the covers to send pull requests. You can even attach your pull request to an existing issue. For example, the following command sends a pull request to master branch of the tugberkugurlu’s repository from the branch that I am currently on and attaches this to an existing issue #1.
hub pull-request -i 1 -b tugberkugurlu:master