A Tired Raccoon’s Containerization Manifesto

Just Do The Containering

Some of you out there are still stuck on old deployment workflows that drop software directly onto shared hosts. Maybe it’s a personal thing that you just don’t have the energy to maintain particularly well. Maybe it’s a service at work stuck without any dedicated owner or maintenance resources that keeps limping along.

This post is a call to action for doing the minimum possible work to get it into a container, and to do that transition badly and quickly. I’ve done it for a bunch of minor things I maintain and it’s improved my life greatly; I just re-build the images with the latest security updates every week or so and let them run on autopilot, never worrying about what previous changes have been made to the host. If you can do it¹, it’s worth it.

Death is Coming

Your existing mutable infrastructure is slowly decaying. You already know that one day you’re going to log in and update the wrong package and it’s gonna blow up half of the software running on your box. Some .so is going to go missing, or an inscrutable configuration conflict will make some network port stop listening.

Either that or you’re not going to update religiously, and eventually it’ll get commandeered by cryptocurrency miners. Either way, your application goes down and you do a lot of annoying grunt work to get it back.

These boxes won’t survive forever. You’ve gotta do something.

Eat Trash

You don’t need to follow the daily churn of containerization “best practices” in order to get 95% of the benefit of containers. The huge benefit is just having a fully repeatable build process that can’t compromise your ability to boot or remotely administer your entire server. Your build doesn’t have to be good, or scalable. I will take 25 garbage shell scripts guaranteed to run isolated within a container over a beautifully maintained deployment system written in $YOUR_FAVORITE_LANGUAGE that installs arbitrary application packages as root onto a host any day of the week. The scope of potential harm from an error is orders of magnitude reduced.

Don’t think hard about it. Just pretend you’re deploying to a new host and manually doing whatever faffing around you’d have to do anyway if your existing server had some unrecoverable hardware failure. The only difference is that instead of typing the commands to do it after an administrative root@host# prompt on some freshly re-provisioned machine, you type it after a RUN statement in a Dockerfile.

Be Free

Now that you’ve built some images, rebuild them, including pulling new base images, every so often. Deploy them with docker run --restart=always ... and forget about them until you have time for another round of security updates. If the service breaks? Roll back to the previous image and worry about it later. Updating this way means you get to decide how much debugging effort it’s worth if something breaks in the rebuild, instead of inherently being down because of a bad update.

There. You’re done. Now you can go live your life instead of updating a million operating system packages.

Sadly, this advice is not universal. I certainly understand what it’s like to have a rat king of complexity containing services with interdependencies too complex to be trivially stuffed into a single container. ↩

What Am Container

Containers are a tool in the fight against evil.

docker containers programming Thursday October 27, 2016

Perhaps you are a software developer.

Perhaps, as a developer, you have recently become familiar with the term "containers".

Perhaps you have heard containers described as something like "LXC, but better", "an application-level interface to cgroups" or "like virtual machines, but lightweight", or perhaps (even less usefully), a function call. You've probably heard of "docker"; do you wonder whether a container is the same as, different from, or part of an Docker?

Are you are bewildered by the blisteringly fast-paced world of "containers"? Maybe you have no trouble understanding what they are - in fact you might be familiar with a half a dozen orchestration systems and container runtimes already - but frustrated because this seems like a whole lot of work and you just don't see what the point of it all is?

If so, this article is for you.

I'd like to lay out what exactly the point of "containers" are, why people are so excited about them, what makes the ecosystem around them so confusing. Unlike my previous writing on the topic, I'm not going to assume you know anything about the ecosystem in general; just that you have a basic understanding of how UNIX-like operating systems separate processes, files, and networks.¹

At the dawn of time, a computer was a single-tasking machine. Somehow, you'd load your program into main memory, and then you'd turn it on; it would run the program, and (if you're lucky) spit out some output onto paper tape.

When a program running on such a computer looked around itself, it could "see" the core memory of the computer it was running on, any attached devices, including consoles, printers, teletypes, or (later) networking equipment. This was of course very powerful - the program had full control of everything attached to the computer - but also somewhat limiting.

This mode of addressing hardware is limiting because it meant that programs would break the instant you moved them to a new computer. They had to be re-written to accommodate new amounts and types of memory, new sizes and brands of storage, new types of networks. If the program had to contain within itself the full knowledge of every piece of hardware that it might ever interact with, it would be very expensive indeed.

Also, if all the resources of a computer were dedicated to one program, then you couldn't run a second program without stomping all over the first one - crashing it by mangling its structures in memory, deleting its data by overwriting its data on disk.

So, programmers cleverly devised a way of indirecting, or "virtualizing", access to hardware resources. Instead of a program simply addressing all the memory in the whole computer, it got its own little space where it could address its own memory - an address space, if you will. If a program wanted more memory, it would ask a supervising program - what we today call a "kernel" - to give it some more memory. This made programs much simpler: instead of memorizing the address offsets where a particular machine kept its memory, a program would simply begin by saying "hey operating system, give me some memory", and then it would access the memory in its own little virtual area.

In other words: memory allocation is just virtual RAM.

Virtualizing memory - i.e. ephemeral storage - wasn't enough; in order to save and transfer data, programs also had to virtualize disk - i.e. persistent storage. Whereas a whole-computer program would just seek to position 0 on the disk and start writing data to it however it pleased, a program writing to a virtualized disk - or, as we might call it today, a "file" - first needed to request a file from the operating system.

In other words: file systems are just virtual disks.

Networking was treated in a similar way. Rather than addressing the entire network connection at once, each program could allocate a little slice of the network - a "port". That way a program could, instead of consuming all network traffic destined for the entire machine, ask the operating system to just deliver it all the traffic for, say, port number seven.

In other words: listening ports are just virtual network cards.

Getting bored by all this obvious stuff yet? Good. One of the things that frustrates me the most about containers is that they are an incredibly obvious idea that is just a logical continuation of a trend that all programmers are intimately familiar with.

All of these different virtual resources exist for the same reason: as I said earlier, if two programs need the same resource to function properly, and they both try to use it without coordinating, they'll both break horribly.²

UNIX-like operating systems more or less virtualize RAM correctly. When one program grabs some RAM, nobody else - modulo super-powered administrative debugging tools - gets to use it without talking to that program. It's extremely clear which memory belongs to which process. If programs want to use shared memory, there is a very specific, opt-in protocol for doing so; it is basically impossible for it to happen by accident.

However, the abstractions we use for disks (filesystems) and network cards (listening ports and addresses) are significantly more limited. Every program on the computer sees the same file-system. The program itself, and the data the program stores, both live on the same file-system. Every program on the computer can see the same network information, can query everything about it, and can receive arbitrary connections. Permissions can remove certain parts of the filesystem from view (i.e. programs can opt-out) but it is far less clear which program "owns" certain parts of the filesystem; access must be carefully controlled, and sometimes mediated by administrators.

In particular, the way that UNIX manages filesystems creates an environment where "installing" a program requires manipulating state in the same place (the filesystem) where other programs might require different state. Popular package managers on UNIX-like systems (APT, RPM, and so on) rarely have a way to separate program installation even by convention, let alone by strict enforcement. If you want to do that, you have to re-compile the software with ./configure --prefix to hard-code a new location. And, fundamentally, this is why the package managers don't support installing to a different place: if the program can tell the difference between different installation locations, then it will, because its developers thought it should go in one place on the file system, and why not hard code it? It works on their machine.

In order to address this shortcoming of the UNIX process model, the concept of "virtualization" became popular. The idea of virtualization is simple: you write a program which emulates an entire computer, with its own storage media, network devices, and then you install an operating system on it. This completely resolves the over-sharing of resources: a process inside a virtual machine is in a very real sense running on a different computer than programs running on a different virtual machine on the same physical device.

However, virtualiztion is also an extremly heavy-weight blunt instrument. Since virtual machines are running operating systems designed for physical machines, they have tons of redundant hardware-management code; enormous amounts of operating system data which could be shared with the host, but since it's in the form of a disk image totally managed by the virtual machine's operating system, the host can't really peek inside to optimize anything. It also makes other kinds of intentional resource sharing very hard: any software to manage the host needs to be installed on the host, since if it is installed on the guest it won't have full access to the host's hardware.

I hate using the term "heavy-weight" when I'm talking about software - it's often bandied about as a content-free criticism - but the difference in overhead between running a virtual machine and a process is the difference between gigabytes and kilobytes; somewhere between 4-6 orders of magnitude. That's a huge difference.

This means that you need to treat virtual machines as multi-purpose, since one VM is too big to run just a single small program. Which means you often have to manage them almost as if they were physical harware.

When we run a program on a UNIX-like operating system, and by so running it, grant it its very own address space, we call the entity that we just created a "process".

This is how to understand a "container".

A "container" is what we get when we run a program and give it not just its own memory, but its own whole virtual filesystem and its own whole virtual network card.

The metaphor to processes isn't perfect, because a container can contain multiple processes with different memory spaces that share a single filesystem. But this is also where some of the "container ecosystem" fervor begins to creep in - this is why people interested in containers will religiously exhort you to treat a container as a single application, not to run multiple things inside it, not to SSH into it, and so on. This is because the whole point of containers is that they are lightweight - far closer in overhead to the size of a process than that of a virtual machine.

A process inside a container, if it queries the operating system, will see a computer where only it is running, where it owns the entire filesystem, and where any mounted disks were explicitly put there by the administrator who ran the container. In other words, if it wants to share data with another application, it has to be given the shared data; opt-in, not opt-out, the same way that memory-sharing is opt-in in a UNIX-like system.

So why is this so exciting?

In a sense, it really is just a lower-overhead way to run a virtual machine, as long as it shares the same kernel. That's not super exciting, by itself.

The reason that containers are more exciting than processes is the same reason that using a filesystem is more exciting than having to use a whole disk: sharing state always, inevitably, leads to brokenness. Opt-in is better than opt-out.

When you give a program a whole filesystem to itself, sharing any data explicitly, you eliminate even the possibility that some other program scribbling on a shared area of the filesystem might break it. You don't need package managers any more, only package installers; by removing the other functions of package managers (inventory, removal) they can be radically simplified, and less complexity means less brokenness.

When you give a program an entire network address to itself, exposing any ports explicitly, you eliminate even the possibility that some rogue program will expose a security hole by listening on a port you weren't expecting. You eliminate the possibility that it might clash with other programs on the same host, hard-coding the same port numbers or auto-discovering the same addresses.

In addition to the exciting things on the run-time side, containers - or rather, the things you run to get containers, "images"³, present some compelling improvements to the build-time side.

On Linux and Windows, building a software artifact for distribution to end-users can be quite challenging. It's challenging because it's not clear how to specify that you depend on certain other software being installed; it's not clear what to do if you have conflicting versions of that software that may not be the same as the versions already available on the user's computer. It's not clear where to put things on the filesystem. On Linux, this often just means getting all of your software from your operating system distributor.

You'll notice I said "Linux and Windows"; not the usual (linux, windows, mac) big-3 desktop platforms, and I didn't say anything about mobile OSes. That's because on macOS, Android, iOS, and Windows Metro, applications already run in their own containers. The rules of macOS containers are a bit weird, and very different from Docker containers, but if you have a Mac you can check out ~/Library/Containers to see the view of the world that the applications you're running can see. iOS looks much the same.

This is something that doesn't get discussed a lot in the container ecosystem, partially because everyone is developing technology at such a breakneck pace, but in many ways Linux server-side containerization is just a continuation of a trend that started on mainframe operating systems in the 1970s and has already been picked up in full force by mobile operating systems.

When one builds an image, one is building a picture of the entire filesystem that the container will see, so an image is a complete artifact. By contrast, a package for a Linux package manager is just a fragment of a program, leaving out all of its dependencies, to be integrated later. If an image runs on your machine, it will (except in some extremely unusual circumstances) run on the target machine, because everything it needs to run is fully included.

Because you build all the software an image requires into the image itself, there are some implications for server management. You no longer need to apply security updates to a machine - they get applied to one application at a time, and they get applied as a normal process of deploying new code. Since there's only one update process, which is "delete the old container, run a new one with a new image", updates can roll out much faster, because you can build an image, run tests for the image with the security updates applied, and be confident that it won't break anything. No more scheduling maintenance windows, or managing reboots (at least for security updates to applications and libraries; kernel updates are a different kettle of fish).

That's why it's exciting. So why's it all so confusing?⁵

Fundamentally the confusion is caused by there just being way too many tools. Why so many tools? Once you've accepted that your software should live in images, none of the old tools work any more. Almost every administrative, monitoring, or management tool for UNIX-like OSes depends intimately upon the ability to promiscuously share the entire filesystem with every other program running on it. Containers break these assumptions, and so new tools need to be built. Nobody really agrees on how those tools should work, and a wide variety of forces ranging from competitive pressure to personality conflicts make it difficult for the panoply of container vendors to collaborate perfectly⁴.

Many companies whose core business has nothing to do with infrastructure have gone through this reasoning process:

Containers are so much better than processes, we need to start using them right away, even if there's some tooling pain in adopting them.
The old tools don't work.
The new tools from the tool vendors aren't ready.
The new tools from the community don't work for our use-case.
Time to write our own tool, just for our use-case and nobody else's! (Which causes problem #3 for somebody else, of course...)

A less fundamental reason is too much focus on scale. If you're running a small-scale web application which has a stable user-base that you don't expect a lot of growth in, there are many great reasons to adopt containers as opposed to automating your operations; and in fact, if you keep things simple, the very fact that your software runs in a container might obviate the need for a system-management solution like Chef, Ansible, Puppet, or Salt. You should totally adopt them and try to ignore the more complex and involved parts of running an orchestration system.

However, containers are even more useful at significant scale, which means that companies which have significant scaling problems invest in containers heavily and write about them prolifically. Many guides and tutorials on containers assume that you expect to be running a multi-million-node cluster with fully automated continuous deployment, blue-green zero-downtime deploys, a 1000-person operations team. It's great if you've got all that stuff, but building each of those components is a non-trivial investment.

So, where does that leave you, my dear reader?

You should absolutely be adopting "container technology", which is to say, you should probably at least be using Docker to build your software. But there are other, radically different container systems - like Sandstorm - which might make sense for you, depending on what kind of services you create. And of course there's a huge ecosystem of other tools you might want to use; too many to mention, although I will shout out to my own employer's docker-as-a-service Carina, which delivered this blog post, among other things, to you.

You shouldn't feel as though you need to do containers absolutely "the right way", or that the value of containerization is derived from adopting every single tool that you can all at once. The value of containers comes from four very simple things:

It reduces the overhead and increases the performance of co-locating multiple applications on the same hardware,
It forces you to explicitly call out any shared state or required resources,
It creates a complete build pipeline that results in a software artifact that can be run without special installation or set-up instructions (at least, on the "software installation" side; you still might require configuration, of course), and
It gives you a way to test exactly what you're deploying.

These benefits can combine and interact in surprising and interesting ways, and can be enhanced with a wide and growing variety of tools. But underneath all the hype and the buzz, the very real benefit of containerization is basically just that it is fixing a very old design flaw in UNIX.

Containers let you share less state, and shared mutable state is the root of all evil.

If you have a more sophisticated understanding of memory, disks, and networks, you'll notice that everything I'm saying here is patently false, and betrays an overly simplistic understanding of the development of UNIX and the complexities of physical hardware and driver software. Please believe that I know this; this is an alternate history of the version of UNIX that was developed on platonically ideal hardware. The messy co-evolution of UNIX, preemptive multitasking, hardware offload for networks, magnetic secondary storage, and so on, is far too large to fit into the margins of this post. ↩
When programs break horribly like this, it's called "multithreading". I have written some software to help you avoid it. ↩
One runs an "executable" to get a process; one runs an "image" to get a container. ↩
Although the container ecosystem is famously acrimonious, companies in it do actually collaborate better than the tech press sometimes give them credit for; the Open Container Project is a significant extraction of common technology from multiple vendors, many of whom are also competitors, to facilitate a technical substrate that is best for the community. ↩
If it doesn't seem confusing to you, consider this absolute gem from the hilarious folks over at CircleCI. ↩

docker run glyph/rproxy

docker twisted Saturday October 22, 2016

Want to TLS-protect your co-located stack of vanity websites with Twisted and Let's Encrypt using HawkOwl's rproxy, but can't tolerate the bone-grinding tedium of a pip install? I built a docker image for you now, so it's now as simple as:

$ mkdir -p conf/certificates;
$ cat > conf/rproxy.ini << EOF;
> [rproxy]
> certificates=certificates
> http_ports=80
> https_ports=443
> [hosts]
> mysite.com_host=<other container host>
> mysite.com_port=8080
> EOF
$ docker run --restart=always -v "$(pwd)"/conf:/conf \
    -p 80:80 -p 443:443 \
    glyph/rproxy;

There are no docs to speak of, so if you're interested in the details, see the tree on github I built it from.

Modulo some handwaving about docker networking to get that <other container host> IP, that's pretty much it. Go forth and do likewise!

A Container Is A Function Call

There’s something missing from the Docker ecosystem: a type-checker.

docker programming Sunday August 14, 2016

It seems to me that the prevailing mental model among users of container technology¹ right now is that a container is a tiny little virtual machine. It’s like a machine in the sense that it is provisioned and deprovisioned by explicit decisions, and we talk about “booting” containers. We configure it sort of like we configure a machine; dropping a bunch of files into a volume, setting some environment variables.

In my mind though, a container is something fundamentally different than a VM. Rather than coming from the perspective of “let’s take a VM and make it smaller so we can do cool stuff” - get rid of the kernel, get rid of fixed memory allocations, get rid of emulated memory access and instructions, so we can provision more of them at higher density... I’m coming at it from the opposite direction.

For me, containers are “let’s take a program and made it bigger so we can do cool stuff”. Let’s add in the whole user-space filesystem so it’s got all the same bits every time, so we don’t need to worry about library management, so we can ship it around from computer to computer as a self-contained unit. Awesome!

Of course, there are other ecosystems that figured this out a really long time ago, but having it as a commodity within the most popular server deployment environment has changed things.

Of course, an individual container isn’t a whole program. That’s why we need tools like compose to put containers together into a functioning whole. This makes a container not just a program, but rather, a part of a program. And of course, we all know what the smaller parts of a program are called:

Functions.²

A container of course is not the function itself; the image is the function. A container itself is a function call.

Perceived through this lens, it becomes apparent that Docker is missing some pretty important information. As a tiny VM, it has all the parts you need: it has an operating system (in the docker build) the ability to boot and reboot (docker run), instrumentation (docker inspect) debugging (docker exec) etc. As a really big function, it’s strangely anemic.

Specifically: in every programming language worth its salt, we have a type system; some mechanism to identify what parameters a function will take, and what return value it will have.

You might find this weird coming from a Python person, a language where

def foo(a, b, c):
    return a.x(c.d(b))

is considered an acceptable level of type documentation by some³; there’s no requirement to say what a, b, and c are. However, just because the type system is implicit, that doesn’t mean it’s not there, even in the text of the program. Let’s consider, from reading this tiny example, what we can discover:

foo takes 3 arguments, their names are “a”, “b”, and “c”, and it returns a value.
Somewhere else in the codebase there’s an object with an x method, which takes a single argument and also returns a value.
The type of <unknown>.x’s argument is the same as the return type of another method somewhere in the codebase, <unknown-2>.d

And so on, and so on. At runtime each of these types takes on a specific, concrete value, with a type, and if you set a breakpoint and single-step into it with a debugger, you can see each of those types very easily. Also at runtime you will get TypeError exceptions telling you exactly what was wrong with what you tried to do at a number of points, if you make a mistake.

The analogy to containers isn’t exact; inputs and outputs aren’t obviously in the shape of “arguments” and “return values”, especially since containers tend to be long-running; but nevertheless, a container does have inputs and outputs in the form of env vars, network services, and volumes.

Let’s consider the “foo” of docker, which would be the middle tier of a 3-tier web application (cribbed from a real live example):

FROM pypy:2
RUN apt-get update -ym
RUN apt-get upgrade -ym
RUN apt-get install -ym libssl-dev libffi-dev
RUN pip install virtualenv
RUN mkdir -p /code/env
RUN virtualenv /code/env
RUN pwd

COPY requirements.txt /code/requirements.txt
RUN /code/env/bin/pip install -r /code/requirements.txt
COPY main /code/main
RUN chmod a+x /code/main

VOLUME /clf
VOLUME /site
VOLUME /etc/ssl/private

ENTRYPOINT ["/code/main"]

In this file, we can only see three inputs, which are filesystem locations: /clf, /site, and /etc/ssl/private. How is this different than our Python example, a language with supposedly “no type information”?

The image has no metadata explaining what might go in those locations, or what roles they serve. We have no way to annotate them within the Dockerfile.
What services does this container need to connect to in order to get its job done? What hostnames will it connect to, what ports, and what will it expect to find there? We have no way of knowing. It doesn’t say. Any errors about the failed connections will come in a custom format, possibly in logs, from the application itself, and not from docker.
What services does this container export? It could have used an EXPOSE line to give us a hint, but it doesn’t need to; and even if it did, all we’d have is a port number.
What environment variables does its code require? What format do they need to be in?
We do know that we could look in requirements.txt to figure out what libraries are going to be used, but in order to figure out what the service dependencies are, we’re going to need to read all of the code to all of them.

Of course, the one way that this example is unrealistic is that I deleted all the comments explaining all of those things. Indeed, best practice these days would be to include comments in your Dockerfiles, and include example compose files in your repository, to give users some hint as to how these things all wire together.

This sort of state isn’t entirely uncommon in programming languages. In fact, in this popular GitHub project you can see that large programs written in assembler in the 1960s included exactly this sort of documentation convention: huge front-matter comments in English prose.

That is the current state of the container ecosystem. We are at the “late ’60s assembly language” stage of orchestration development. It would be a huge technological leap forward to be able to communicate our intent structurally.

When you’re building an image, you’re building it for a particular purpose. You already pretty much know what you’re trying to do and what you’re going to need to do it.

When instantiated, the image is going to consume network services. This is not just a matter of hostnames and TCP ports; those services need to be providing a specific service, over a specific protocol. A generic reverse proxy might be able to handle an arbitrary HTTP endpoint, but an API client needs that specific API. A database admin tool might be OK with just “it’s a database” but an application needs a particular schema.
It’s going to consume environment variables. But not just any variables; the variables have to be in a particular format.
It’s going to consume volumes. The volumes need to contain data in a particular format, readable and writable by a particular UID.
It’s also going to produce all of these things; it may listen on a network service port, provision a database schema, or emit some text that needs to be fed back into an environment variable elsewhere.

Here’s a brief sketch of what I want to see in a Dockerfile to allow me to express this sort of thing:

FROM ...
RUN ...

LISTENS ON: TCP:80 FOR: org.ietf.http/com.example.my-application-api
CONNECTS TO: pgwritemaster.internal ON: TCP:5432 FOR: org.postgresql.db/com.example.my-app-schema
CONNECTS TO: {{ETCD_HOST}} ON: TCP:{{ETCD_PORT}} FOR: com.coreos.etcd/client-communication
ENVIRONMENT NEEDS: ETCD_HOST FORMAT: HOST(com.coreos.etcd/client-communication)
ENVIRONMENT NEEDS: ETCD_PORT FORMAT: PORT(com.coreos.etcd/client-communication)
VOLUME AT: /logs FORMAT: org.w3.clf REQUIRES: WRITE UID: 4321

An image thusly built would refuse to run unless:

Somewhere else on its network, there was an etcd host/port known to it, its host and port supplied via environment variables.
Somewhere else on its network, there was a postgres host, listening on port 5432, with a name-resolution entry of “pgwritemaster.internal”.
An environment variable for the etcd configuration was supplied
A writable volume for /logs was supplied, owned by user-ID 4321 where it could write common log format logs.

There are probably a lot of flaws in the specific syntax here, but I hope you can see past that, to the broader point that the software inside a container has precise expectations of its environment, and that we presently have no way of communicating those expectations beyond writing a Melvilleian essay in each Dockerfile comments, beseeching those who would run the image to give it what it needs.

Why bother with this sort of work, if all the image can do with it is “refuse to run”?

First and foremost, today, the image effectively won’t run. Oh, it’ll start up, and it’ll consume some resources, but it will break when you try to do anything with it. What this metadata will allow the container runtime to do is to tell you why the image didn’t run, and give you specific, actionable, fast feedback about what you need to do in order to fix the problem. You won’t have to go groveling through logs; which is always especially hard if the back-end service you forgot to properly connect to was the log aggregation service. So this will be an order of magnitude speed improvement on initial deployments and development-environment setups for utility containers. Whole applications typically already come with a compose file, of course, but ideally applications would be built out of functioning self-contained pieces and not assembled one custom container at a time.

Secondly, if there were a strong tooling standard for providing this metadata within the image itself, it might become possible for infrastructure service providers (like, ahem, my employer) to automatically detect and satisfy service dependencies. Right now, if you have a database as a service that lives outside the container system in production, but within the container system in development and test, there’s no way for the orchestration layer to say “good news, everyone! you can find the database you need here: ...”.

My main interest is in allowing open source software developers to give service operators exactly what they need, so the upstream developers can get useful bug reports. There’s a constant tension where volunteer software developers find themselves fielding bug reports where someone deployed their code in a weird way, hacked it up to support some strange environment, built a derived container that had all kinds of extra junk in it to support service discovery or logging or somesuch, and so they don’t want to deal with the support load that that generates. Both people in that exchange are behaving reasonably. The developers gave the ops folks a container that runs their software to the best of their abilities. The service vendors made the minimal modifications they needed to have the container become a part of their service fabric. Yet we arrive at a scenario where nobody feels responsible for the resulting artifact.

If we could just say what it is that the container needs in order to really work, in a way which was precise and machine-readable, then it would be clear where the responsibility lies. Service providers could just run the container unmodified, and they’d know very clearly whether or not they’d satisfied its runtime requirements. Open source developers - or even commercial service vendors! - could say very clearly what they expected to be passed in, and when they got bug reports, they’d know exactly how their service should have behaved.

which mostly but not entirely just means “docker”; it’s weird, of course, because there are pieces that docker depends on and tools that build upon docker which are part of this, but docker remains the nexus. ↩
Yes yes, I know that they’re not really functions Tristan, they’re subroutines, but that’s the word people use for “subroutines” nowadays. ↩
Just to be clear: no it isn’t. Write a damn docstring, or at least some type annotations. ↩

Deploying Python Applications with Docker - A Suggestion

A template for deploying Python applications into Docker containers.

python deployment docker ops Friday March 06, 2015

Deploying python applications is much trickier than it should be.

Docker can simplify this, but even with Docker, there are a lot of nuances around how you package your python application, how you build it, how you pull in your python and non-python dependencies, and how you structure your images.

I would like to share with you a strategy that I have developed for deploying Python apps that deals with a number of these issues. I don’t want to claim that this is the only way to deploy Python apps, or even a particularly right way; in the rapidly evolving containerization ecosystem, new techniques pop up every day, and everyone’s application is different. However, I humbly submit that this process is a good default.

Rather than equivocate further about its abstract goodness, here are some properties of the following container construction idiom:

It reduces build times from a naive “sudo setup.py install” by using Python wheels to cache repeatably built binary artifacts.
It reduces container size by separating build containers from run containers.
It is independent of other tooling, and should work fine with whatever configuration management or container orchestration system you want to use.
It uses existing Python tooling of pip and virtualenv, and therefore doesn’t depend heavily on Docker. A lot of the same concepts apply if you have to build or deploy the same Python code into a non-containerized environment. You can also incrementally migrate towards containerization: if your deploy environment is not containerized, you can still build and test your wheels within a container and get the advantages of containerization there, as long as your base image matches the non-containerized environment you’re deploying to. This means you can quickly upgrade your build and test environments without having to upgrade the host environment on finicky continuous integration hosts, such as Jenkins or Buildbot.

To test these instructions, I used Docker 1.5.0 (via boot2docker, but hopefully that is an irrelevant detail). I also used an Ubuntu 14.04 base image (as you can see in the docker files) but hopefully the concepts should translate to other base images as well.

In order to show how to deploy a sample application, we’ll need a sample application to deploy; to keep it simple, here’s some “hello world” sample code using Klein:

# deployme/__init__.py
from klein import run, route

@route('/')
def home(request):
    request.setHeader("content-type", "text/plain")
    return 'Hello, world!'

def main():
    run("", 8081)

And an accompanying setup.py:

from setuptools import setup, find_packages

setup (
    name             = "DeployMe",
    version          = "0.1",
    description      = "Example application to be deployed.",
    packages         = find_packages(),
    install_requires = ["twisted>=15.0.0",
                        "klein>=15.0.0",
                        "treq>=15.0.0",
                        "service_identity>=14.0.0"],
    entry_points     = {'console_scripts':
                        ['run-the-app = deployme:main']}
)

Generating certificates is a bit tedious for a simple example like this one, but in a real-life application we are likely to face the deployment issue of native dependencies, so to demonstrate how to deal with that issue, that this setup.py depends on the service_identity module, which pulls in cryptography (which depends on OpenSSL) and its dependency cffi (which depends on libffi).

To get started telling Docker what to do, we’ll need a base image that we can use for both build and run images, to ensure that certain things match up; particularly the native libraries that are used to build against. This also speeds up subsquent builds, by giving a nice common point for caching.

In this base image, we’ll set up:

a Python runtime (PyPy)
the C libraries we need (the libffi6 and openssl ubuntu packages)
a virtual environment in which to do our building and packaging

# base.docker
FROM ubuntu:trusty

RUN echo "deb http://ppa.launchpad.net/pypy/ppa/ubuntu trusty main" > \
    /etc/apt/sources.list.d/pypy-ppa.list

RUN apt-key adv --keyserver keyserver.ubuntu.com \
                --recv-keys 2862D0785AFACD8C65B23DB0251104D968854915
RUN apt-get update

RUN apt-get install -qyy \
    -o APT::Install-Recommends=false -o APT::Install-Suggests=false \
    python-virtualenv pypy libffi6 openssl

RUN virtualenv -p /usr/bin/pypy /appenv
RUN . /appenv/bin/activate; pip install pip==6.0.8

The apt options APT::Install-Recommends and APT::Install-Suggests are just there to prevent python-virtualenv from pulling in a whole C development toolchain with it; we’ll get to that stuff in the build container. In the run container, which is also based on this base container, we will just use virtualenv and pip for putting the already-built artifacts into the right place. Ubuntu expects that these are purely development tools, which is why it recommends installation of python development tools as well.

You might wonder “why bother with a virtualenv if I’m already in a container”? This is belt-and-suspenders isolation, but you can never have too much isolation.

It’s true that in many cases, perhaps even most, simply installing stuff into the system Python with Pip works fine; however, for more elaborate applications, you may end up wanting to invoke a tool provided by your base container that is implemented in Python, but which requires dependencies managed by the host. By putting things into a virtualenv regardless, we keep the things set up by the base image’s package system tidily separated from the things our application is building, which means that there should be no unforseen interactions, regardless of how complex the application’s usage of Python might be.

Next we need to build the base image, which is accomplished easily enough with a docker command like:

$ docker build -t deployme-base -f base.docker .;

Next, we need a container for building our application and its Python dependencies. The dockerfile for that is as follows:

# build.docker
FROM deployme-base

RUN apt-get install -qy libffi-dev libssl-dev pypy-dev
RUN . /appenv/bin/activate; \
    pip install wheel

ENV WHEELHOUSE=/wheelhouse
ENV PIP_WHEEL_DIR=/wheelhouse
ENV PIP_FIND_LINKS=/wheelhouse

VOLUME /wheelhouse
VOLUME /application

ENTRYPOINT . /appenv/bin/activate; \
           cd /application; \
           pip wheel .

Breaking this down, we first have it pulling from the base image we just built. Then, we install the development libraries and headers for each of the C-level dependencies we have to work with, as well as PyPy’s development toolchain itself. Then, to get ready to build some wheels, we install the wheel package into the virtualenv we set up in the base image. Note that the wheel package is only necessary for building wheels; the functionality to install them is built in to pip.

Note that we then have two volumes: /wheelhouse, where the wheel output should go, and /application, where the application’s distribution (i.e. the directory containing setup.py) should go.

The entrypoint for this image is simply running “pip wheel” with the appropriate virtualenv activated. It runs against whatever is in the /application volume, so we could potentially build wheels for multiple different applications. In this example, I’m using pip wheel . which builds the current directory, but you may have a requirements.txt which pins all your dependencies, in which case you might want to use pip wheel -r requirements.txt instead.

At this point, we need to build the builder image, which can be accomplished with:

$ docker build -t deployme-builder -f build.docker .;

This builds a deployme-builder that we can use to build the wheels for the application. Since this is a prerequisite step for building the application container itself, you can go ahead and do that now. In order to do so, we must tell the builder to use the current directory as the application being built (the volume at /application) and to put the wheels into a wheelhouse directory (one called wheelhouse will do):

$ mkdir -p wheelhouse;
$ docker run --rm \
         -v "$(pwd)":/application \
         -v "$(pwd)"/wheelhouse:/wheelhouse \
         deployme-builder;

After running this, if you look in the wheelhouse directory, you should see a bunch of wheels built there, including one for the application being built:

$ ls wheelhouse
DeployMe-0.1-py2-none-any.whl
Twisted-15.0.0-pp27-none-linux_x86_64.whl
Werkzeug-0.10.1-py2-none-any.whl
cffi-0.9.0-py2-none-any.whl
# ...

At last, time to build the application container itself. The setup for that is very short, since most of the work has already been done for us in the production of the wheels:

# run.docker
FROM deployme-base

ADD wheelhouse /wheelhouse
RUN . /appenv/bin/activate; \
    pip install --no-index -f wheelhouse DeployMe

EXPOSE 8081

ENTRYPOINT . /appenv/bin/activate; \
           run-the-app

During build, this dockerfile pulls from our shared base image, then adds the wheelhouse we just produced as a directory at /wheelhouse. The only shell command that needs to run in order to get the wheels installed is pip install TheApplicationYouJustBuilt, with two options: --no-index to tell pip “don’t bother downloading anything from PyPI, everything you need should be right here”, and, -f wheelhouse which tells it where “here” is.

The entrypoint for this one activates the virtualenv and invokes run-the-app, the setuptools entrypoint defined above in setup.py, which should be on the $PATH once that virtualenv is activated.

The application build is very simple, just

$ docker build -t deployme-run -f run.docker .;

to build the docker file.

Similarly, running the application is just like any other docker container:

$ docker run --rm -it -p 8081:8081 deployme-run

You can then hit port 8081 on your docker host to load the application.

The command-line for docker run here is just an example; for example, I’m passing --rm so that if you run this example just so that it won’t clutter up your container list. Your environment will have its own way to call docker run, how to get your VOLUMEs and EXPOSEd ports mapped, and discussing how to orchestrate your containers is out of scope for this post; you can pretty much run it however you like. Everything the image needs is built in at this point.

To review:

have a common base container that contains all your non-Python (C libraries and utilities) dependencies. Avoid installing development tools here.
use a virtualenv even though you’re in a container to avoid any surprises from the host Python environment
have a “build” container that just makes the virtualenv and puts wheel and pip into it, and runs pip wheel
run the build container with your application code in a volume as input and a wheelhouse volume as output
create an application container by starting from the same base image and, once again not installing any dev tools, pip install all the wheels that you just built, turning off access to PyPI for that installation so it goes quickly and deterministically based on the wheels you’ve built.

While this sample application uses Twisted, it’s quite possible to apply this same process to just about any Python application you want to run inside Docker.

I’ve put a sample project up on Github which contain all the files referenced here, as well as “build” and “run” shell scripts that combine the necessary docker commandlines to go through the full process to build and run this sample app. While it defaults to the PyPy runtime (as most networked Python apps generally should these days, since performance is so much better than CPython), if you have an application with a hard CPython dependency, I’ve also made a branch and pull request on that project for CPython, and you can look at the relatively minor patch required to get it working for CPython as well.

Now that you have a container with an application in it that you might want to deploy, my previous write-up on a quick way to securely push stuff to a production service might be of interest.

(Once again, thanks to my employer, Rackspace, for sponsoring the time for me to write this post. Thanks also to Shawn Ashlee and Jesse Spears for helping me refine these ideas and listening to me rant about them. However, that expression of gratitude should not be taken as any kind of endorsement from any of these parties as to my technical suggestions or opinions here, as they are entirely my own.)