git-based fabric deploys are awesome

When I was pointed to Python Deployment Anti-Patterns by a colleague I was a little shocked to see that the way we had been deploying applications with fabric and git over the past two years (over 1500 deployments) with no problems was being called an Anti-Pattern. There are definitely many ways to deploy software applications and they all have their pros and cons. Our process is by no means perfect but the way that we use git within fabric is definitely one of the best parts of our deployment process.

In his follow-up article Hynek made the case that deploying with native packages is better. On my team we actually started out deploying packages but since developers deploy we got sick of waiting for the packages to build and upload so we switched to git-based deploys. Packages are, of course, a valid way to deploy software, but I think the criticisms leveled against fabric git-based deploys might have been against doing these deploys in a specific way. I'm writing this article to show you how we have been successful using git-based fabric deployments.

I agree with many of his points:

"Don't use ancient system Python versions"
"Use virtual environments"
"Look into alternatives to Apache + mod_wsgi setups"
"Don't run your daemons in a tmux/screen"

Upstart is my personal favorite because it is very stable and the configuration is succinct. Here's an example of a daemon that I've had running on one of my personal projects for several years with no issues:

start on runlevel [12345]
stop on runlevel [0]

respawn

exec sudo -u www-data PATH=path/to/app VIRTUAL_ENV=path/to/virtual_env path/to/python_server_script

Why anyone would want to write a billion line init script now that upstart exists is beyond me. Perhaps they don't know about upstart. It could also be that they are stuck on CentOS or RedHat. My heart goes out to you if that's the case. I know how that feels.

Here are some of the points I disagree with:

Configuration is not part of the application

I've seen others make this same claim and on the face of it it makes sense up to a point. On my team developers deploy so we keep templates of configurations and the differences are kept in context variables that are passed into the templates. If there is sensitive information we keep it outside of version control. Really, if you want to test changes from dev through staging and onto production why not keep the configuration as similar as possible? On projects where teams are creating very generic apps that are being deployed with many different configurations I understand the need for this but most web application developers are deploying to a very specific target (production). It makes sense to keep your development settings as close to that target as possible. For example, if staging and production have the ENCRYPT_STUFF setting set to TRUE then your development environment should have it set too. But they should all have different keys and the production setting should be kept out of version control.

What's wrong with Fabric+git-pull?

It doesn't scale. As soon as you have more than a single deployment target, it quickly becomes a hassle to pull changes, check dependencies and restart the daemon on every single server. A new version of Django is out? Great, fetch it on every single server. A new version of psycopg2? Awesome, compile it on each of n servers.

Fabric will roll through all commands on all servers in a predictable manner one after the other. That way they can be taken out of the load balanced pool before the service is HUP'd and put them back in after it comes back. If this is done automatically with unattended package upgrades (as proposed later in the article) isn't there the possibility that all your servers become unavailable at the same time?

You should always run pip and if there is nothing to upgrade it will simply do nothing. There's no need to download all of the packages - you can have them seeded on each server before starting the upgrade.

It's hard to integrate with Puppet/Chef. It's easy to tell Puppet "on server X, keep package foo-bar up-to-date or keep it at a special version!" That's a one-liner. Try that while baby sitting git and pip.

I can't speak to integrating fabric with Puppet and Chef but it's basically a one-liner to update a remote target with fabric:

cd path/to/git/repo && git reset --hard [deployment-sha1] && pip install -r path/to/requirements.txt

It can leave your app in an inconsistent state. Sometimes git pull fails halfway through because of network problems, or pip times out while installing dependencies because PyPI went away (I heard that happens occasionally cough). Your app at this point is – put simply – broken.

A git pull will not leave your app in an inconsistent state. If the network fails it won't change your working copy and fabric will stop the script because git will return an error. That said I don't think you should use git pull anyway since it is one more moving part that can fail during deployment and it requires that your private repository be open to the world. Since git is distributed a developer can push their repo's immutable store to the target using git push during deployment. Running git reset --hard [deployment-sha1] after the push is finished will update the working copy. Since there is a repo on the other end you'll only be sending the new objects since the last push to the target. This is why git-based deploys beat packages speed-wise. Most of our code deploys take a fraction of a second.

Even a private PyPI mirror can fail. Why not upload the packages to the target and run pip like this?

pip install --no-index --find-links file:///[local-path-to-packages] -r requirements.txt

You could even store your packages in a git submodule and sync your submodules at the same time. (We sync submodules as well, it's only a little extra work.)

Weird race conditions can happen. Imagine you're pulling from git and at the same time, the app decides to import a module that changed profoundly since its last deployment. A plain crash is the best case scenario here.

When you install with a package you have to stop and restart the app. You need to do the same thing if you use git and fabric. With git, it takes much less time to update because only the modified files are swapped out. Packages copy whole trees of files many of which are most likely not modified between releases so the app will be down longer while this disk IO takes place.

Check out the gitric fabric module I wrote that performs git deployments in the way I've described above.

One other valid problem I've heard raised about git-based deploys is that you can end up with cruft in your working copy that sticks around like .pyc files where the original .py file is deleted and there is the chance that this file could still be imported even though the original .py was deleted. Since cloning a local git repository uses hard links you can seed your remote repository and then clone it locally on the same machine (even for slightly large projects this only takes a little extra time). Stop your server, move the old repository out of the way and move the new cloned repo where the old one was (or use a current symlink) and then restart the server.

Git-based deployments make sense for scripting languages where there isn't a compile step so the repo can be sent as-is to production (so it wouldn't make sense for a Java application). It's worth harnessing git to make deployments faster. If we only had to deploy once a month we might've settled for package-based deployments but we push often and got sick of waiting for packages to build and upload.

Packages are, of course, a legitimate way to push out changes but the downside of deploying with packages is that it takes time to build them and upload them
Git + fabric is suitable for deployments (my team has deployed using it over 1500 times)
git-based deployments are lightning fast to deploy and roll back
- There is no build step
- You only have to upload objects that have changed
Use git push, don't use git pull
- It's one less moving part that can fail during deployment
- You don't have to open your git repository to the world
- You can pre-seed git's immutable object store without affecting your running application
You can have pip use local packages which are more reliable and you also avoid having to set up a PyPI mirror