Simple 0-Downtime Blue Green Deployments

Having worked on six e-commerce websites (half of which make millions of dollars in revenue every year) I can safely say that downtime is a sure fire way to upset the business side of any company. Time, after all, is money. I've worked with teams that have tried minimizing downtime incurred by releases in many different ways. Here are some of the extremes:

On one end of the spectrum you can avoid downtime during deployments by only deploying during maintenance windows. The downside here is pretty obvious - what if the release introduces a bug and you don't find out about it until during a peak traffic period? I've seen people throw their hands in the air and say "I guess our customers can't use functionality X and will get that error until we can deploy tomorrow morning" in shops where this was the way deployments were done. I've also had a front row seat when a site I was working on was brought down for an emergency deployment and we were inundated with customer complaints.

The other side of the spectrum I've seen tried is blue/green phoenix deployments - rebuilding each and every VM with the same software but a new version of the application. After testing is done on the new VMs you can cut over either a hardware switch or software like HAProxy so it points to the new version. Needless to say using this method takes a very long time if all you want to do is deploy a one line fix. If you aren't familiar with blue/green deployments be sure to check out Martin Fowler's article about them.

There is a Goldilocks solution to this problem which won't take down a site and won't take as long as a full blue/green phoenix deployment. That said, as with all technical solutions, it isn't without its own drawbacks and might not be right for all deployments.

Here is the ridiculously simple Flask application I'll be deploying as an example:

import os, time

from flask import Flask
app = Flask(__name__)


@app.route("/")
def hello():
    return "Hello 0-downtime %s World!" % os.environ.get('BLUEGREEN', 'bland')

Here is the fabfile we will use:

import os
import sys
from StringIO import StringIO

from fabric.api import task, local, run
from fabric.operations import put
from fabric.state import env

sys.path.append('../')
from gitric.api import (  # noqa
    git_seed, git_reset, allow_dirty, force_push,
    init_bluegreen, swap_bluegreen
)


@task
def prod():
    env.user = 'test-deployer'
    env.bluegreen_root = '/home/test-deployer/bluegreenmachine/'
    env.bluegreen_ports = {'blue': '8888',
                           'green': '8889'}
    init_bluegreen()


@task
def deploy(commit=None):
    if not commit:
        commit = local('git rev-parse HEAD', capture=True)
    env.repo_path = os.path.join(env.next_path, 'repo')
    git_seed(env.repo_path, commit)
    git_reset(env.repo_path, commit)
    run('kill $(cat %(pidfile)s) || true' % env)
    run('virtualenv %(virtualenv_path)s' % env)
    run('source %(virtualenv_path)s/bin/activate && '
        'pip install -r %(repo_path)s/bluegreen-example/requirements.txt'
        % env)
    put(StringIO('proxy_pass http://127.0.0.1:%(bluegreen_port)s/;' % env),
        env.nginx_conf)
    run('cd %(repo_path)s/bluegreen-example && PYTHONPATH=. '
        'BLUEGREEN=%(color)s %(virtualenv_path)s/bin/gunicorn -D '
        '-b 0.0.0.0:%(bluegreen_port)s -p %(pidfile)s app:app'
        % env)


@task
def cutover():
    swap_bluegreen()
    run('sudo /etc/init.d/nginx reload')

The updates in deploy should be idempotent (that is to say that you can run deploy multiple times and the result should be the same each time (except for the pids of the workers that are started)). One tricky bit here when you are harnessing git for your deployments is that you want to clean up your remote working copy. I didn't do this in the example but you can use git clean to make sure only the things in the repository end up in the working copy. I did this with Python but you can substitute any language that doesn't require a binary build step and has a way of installing isolated packages. I guess it could be done with Ruby and RVM. I also have a nodejs example in the gitric repository.

The directory structure that gets built out looks like this:

├── blue
│   ├── env
│   ├── etc
│   └── repo
├── green
│   ├── env
│   ├── etc
│   └── repo
├── live -> /home/test-deployer/bluegreenmachine/green
└── next -> /home/test-deployer/bluegreenmachine/blue

To do the initial build-out all you need is an automator user on your remote server and an nginx host entry set up something like this:

server {
    listen 80;
    server_name server.name.here;

    location / {
        include /home/test-deployer/bluegreenmachine/live/etc/nginx.conf;
    }
}

server {
    listen 80;
    server_name next.server.name.here;

    location / {
        include /home/test-deployer/bluegreenmachine/next/etc/nginx.conf;
    }
}

Then you can run

fab prod deploy
fab prod cutover

These steps are intentionally separated so you can check the next environment before cutting over to the new release.

I cut over to a new release while running ab and continuously hitting the server with curl to see what the server was returning:

 % ab -c 100 -n 5000 http://my.server.here/
This is ApacheBench, Version 2.3 <$Revision: 1528965 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking my.server.here (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests


Server Software:        nginx/1.4.1
Server Hostname:        my.server.here
Server Port:            80

Document Path:          /
Document Length:        28 bytes

Concurrency Level:      100
Time taken for tests:   33.180 seconds
Complete requests:      5000
Failed requests:        2576
   (Connect: 0, Receive: 0, Length: 2576, Exceptions: 0)
Total transferred:      922576 bytes
HTML transferred:       142576 bytes
Requests per second:    150.69 [#/sec] (mean)
Time per request:       663.607 [ms] (mean)
Time per request:       6.636 [ms] (mean, across all concurrent requests)
Transfer rate:          27.15 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      164  326  87.5    321    1393
Processing:   161  308 188.7    284    4045
Waiting:      161  307 186.5    284    4045
Total:        338  635 216.9    646    4409

Percentage of the requests served within a certain time (ms)
  50%    646
  66%    675
  75%    689
  80%    699
  90%    723
  95%    758
  98%    789
  99%    899
 100%   4409 (longest request)

My server is the tiniest VM Linode offers and I'm on the other side of the Earth from it so I'm not really concerned about the performance. I am checking that all the incoming requests were served while a release was deployed without any downtime. You can see that ab counted 2576 failing length requests - those aren't actually failures - ab counts different content from the initial response it receives as a failure and halfway through the load test I cut over to a new release:

 % for x in $(seq 100); do curl -s -S http://my.server.here/ && echo; done
Hello 0-downtime blue World!
Hello 0-downtime blue World!
Hello 0-downtime blue World!
Hello 0-downtime blue World!
Hello 0-downtime blue World!
Hello 0-downtime blue World!
Hello 0-downtime blue World!
Hello 0-downtime blue World!
Hello 0-downtime blue World!
Hello 0-downtime blue World!
Hello 0-downtime blue World!
Hello 0-downtime blue World!
Hello 0-downtime blue World!
Hello 0-downtime blue World!
Hello 0-downtime blue World!
Hello 0-downtime blue World!
Hello 0-downtime blue World!
Hello 0-downtime blue World!
Hello 0-downtime blue World!
Hello 0-downtime blue World!
Hello 0-downtime blue World!
Hello 0-downtime blue World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!
Hello 0-downtime green World!

The special sauce is leveraging the reload functionality that most webservers (Apache, nginx) offer. Existing workers are told that they should not handle new requests and the new workers that are spawned proxy all traffic to the new version. Here is a trace from my server right after a cutover:

COMMAND    PID          USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME

nginx    13636          root    8u  IPv4 11283302      0t0  TCP *:80 (LISTEN)
nginx    29426      www-data    8u  IPv4 11283302      0t0  TCP *:80 (LISTEN)
nginx    29427      www-data    8u  IPv4 11283302      0t0  TCP *:80 (LISTEN)
nginx    29428      www-data    8u  IPv4 11283302      0t0  TCP *:80 (LISTEN)
nginx    29429      www-data    8u  IPv4 11283302      0t0  TCP *:80 (LISTEN)
nginx    29381      www-data   14u  IPv4 16961706      0t0  TCP SERVER_IP:80->PING_IP:46083 (ESTABLISHED)
nginx    29381      www-data   15u  IPv4 16961707      0t0  TCP localhost:48628->localhost:8889 (ESTABLISHED)
nginx    29429      www-data    5u  IPv4 16961753      0t0  TCP SERVER_IP:80->PING_IP:46084 (ESTABLISHED)
nginx    29429      www-data    6u  IPv4 16961754      0t0  TCP localhost:33233->localhost:8888 (ESTABLISHED)

gunicorn 29223 test-deployer    5u  IPv4 16953570      0t0  TCP *:8888 (LISTEN)
gunicorn 29340 test-deployer    5u  IPv4 16953579      0t0  TCP *:8889 (LISTEN)
gunicorn 29345 test-deployer    5u  IPv4 16953579      0t0  TCP *:8889 (LISTEN)
gunicorn 29345 test-deployer    9u  IPv4 16962807      0t0  TCP localhost:8889->localhost:48628 (ESTABLISHED)
gunicorn 29391 test-deployer    5u  IPv4 16953570      0t0  TCP *:8888 (LISTEN)
gunicorn 29391 test-deployer    9u  IPv4 16960496      0t0  TCP localhost:8888->localhost:33233 (ESTABLISHED)

root     13636  0.0  0.3  12920  3208 ?        Ss   Jun16   0:00 nginx: master process /usr/sbin/nginx
www-data 29381  0.0  0.2  12904  2104 ?        S    14:51   0:00 nginx: worker process is shutting down
www-data 29426  0.0  0.1  12920  1888 ?        S    14:52   0:00 nginx: worker process
www-data 29427  0.0  0.1  12920  1888 ?        S    14:52   0:00 nginx: worker process
www-data 29428  0.0  0.1  12920  1888 ?        S    14:52   0:00 nginx: worker process
www-data 29429  0.0  0.2  12920  2380 ?        S    14:52   0:00 nginx: worker process

nginx PID 29381 (labeled "nginx: worker process is shutting down") is handling an old request to the previous release and will shut down once it is finished. A request that came in after the release is going to port 8888 (the new release). All future requests will go to the new nginx workers which forward traffic to port 8888. These are the details of how nginx handles graceful reloads but a complete understanding this isn't necessary to harness the power of this deployment method.

Using git to deploy code for languages which don't require builds like Python and Ruby shortens the time it takes to build packages and deploy. I wrote about this a few years ago. Coupling that with blue/green deployment techniques on the same server has led to a very pleasant deployment experience for me and my team for the past year and a half. Everyone takes turns deploying and as our fleet of servers grows our deployment process won't get any slower now that we use the @parallel decorator during the update phase.

It takes a tiny extra amount of planning to write code and migrations that can be deployed without bringing down a live service but with experimentation and practice you will find that it is not that much work. This video from the Disqus team is an amazing resource. You should prefix your memcache keys with a short git ref and warm up your cache before cutting over. With Postgres you can usually add new tables and even add new NULL-default columns without problems but you'll definitely want to test your migrations on a staging environment which is simulating locked rows (if you use SELECT FOR UPDATE to ensure consistency). If you use a background worker like Celery tasks might linger from previous versions so you need to handle cases where the old API is called with a default:

@task
def process_order(order_id, resent=None):
    ....

If there is a scheduled process_order with the old function signature in the queue it could fail unless you give the new parameters you add default values. These are just a few of the caveats I could think of. Always test deployments and rolling back on staging when in doubt until you get the hang of it.

There are numerous reasons why you would want to deploy updates to an API or website without downtime using the blue/green deployment method:

Customer satisfaction - living and working on the other side of the world (Korea) I find it very frustrating that services I depend on to get my work done think that "maintenance hours" are in the middle of my day just because the sun has set on their side of the world.
You can roll back from bad releases without re-deploying - the old release is still there so you can cut back to it in case you find problems in the new release.
Ability to fix unforeseen problems quickly - should you determine that there is a problem which isn't large enough to warrant cutting back to the old release you can still deploy a fix even while there are thousands or millions of customers using your service without interrupting them.
You are one step closer to continuous deployment.

As I said above there are countless techniques that can be used to deploy software and they all have their trade-offs. OS-level packages can't be upgraded in isolation like virtualenv and the application can. Critics might say that this only works for language-level packages and not OS-level packages or even OS upgrades. I fully understand this point and I guess it's just a trade-off. The future looks very bright when it comes to techniques that provide even further isolation and faster deployments like docker and other similar projects. I look forward to using tools like this to make it so there is even less downtime on the projects I work on in the future. In the meantime this porridge is just right for the type of projects I'm working on.