Perhaps you are a software developer.
Perhaps, as a developer, you have recently become familiar with the term "containers".
Perhaps you have heard containers described as something like "LXC, but
better", "an application-level interface to cgroups" or "like virtual machines,
but lightweight", or perhaps (even less
usefully), a function call. You've probably heard
of "docker"; do you wonder whether a container is the same as, different from,
or part of an Docker?
Are you are bewildered by the blisteringly fast-paced world of "containers"?
Maybe you have no trouble understanding what they are - in fact you might be
familiar with a half a dozen orchestration systems and container runtimes
already - but frustrated because this seems like a whole lot of work and you
just don't see what the point of it all is?
If so, this article is for you.
I'd like to lay out what exactly the point of "containers" are, why people
are so excited about them, what makes the ecosystem around them so confusing.
Unlike my previous writing on the topic, I'm not going to assume you know
anything about the ecosystem in general; just that you have a basic
understanding of how UNIX-like operating systems separate processes, files, and
networks.
At the dawn of time, a computer was a single-tasking machine. Somehow, you'd
load your program into main memory, and then you'd turn it on; it would run the
program, and (if you're lucky) spit out some output onto paper tape.
When a program running on such a computer looked around itself, it could "see"
the core memory of the computer it was running on, any attached devices,
including consoles, printers, teletypes, or (later) networking equipment. This
was of course very powerful - the program had full control of everything
attached to the computer - but also somewhat limiting.
This mode of addressing hardware is limiting because it meant that programs
would break the instant you moved them to a new computer. They had to be
re-written to accommodate new amounts and types of memory, new sizes and brands
of storage, new types of networks. If the program had to contain within itself
the full knowledge of every piece of hardware that it might ever interact with,
it would be very expensive indeed.
Also, if all the resources of a computer were dedicated to one program, then
you couldn't run a second program without stomping all over the first one -
crashing it by mangling its structures in memory, deleting its data by
overwriting its data on disk.
So, programmers cleverly devised a way of indirecting, or "virtualizing",
access to hardware resources. Instead of a program simply addressing all the
memory in the whole computer, it got its own little space where it could
address its own memory -
an address space, if you will.
If a program wanted more memory, it would ask a supervising program - what we
today call a "kernel" - to give it some more memory. This made programs much
simpler: instead of memorizing the address offsets where a particular machine
kept its memory, a program would simply begin by saying "hey operating system,
give me some memory", and then it would access the memory in its own little
virtual area.
In other words: memory allocation is just virtual RAM.
Virtualizing memory - i.e. ephemeral storage - wasn't enough; in order to save
and transfer data, programs also had to virtualize disk - i.e. persistent
storage. Whereas a whole-computer program would just seek to position 0 on the
disk and start writing data to it however it pleased, a program writing to a
virtualized disk - or, as we might call it today, a "file" - first needed to
request a file from the operating system.
In other words: file systems are just virtual disks.
Networking was treated in a similar way. Rather than addressing the entire
network connection at once, each program could allocate a little slice of the
network - a "port".
That way a program could, instead of consuming all network traffic destined for
the entire machine, ask the operating system to just deliver it all the traffic
for, say, port number seven.
In other words: listening ports are just virtual network cards.
Getting bored by all this obvious stuff yet? Good. One of the things that
frustrates me the most about containers is that they are an incredibly obvious
idea that is just a logical continuation of a trend that all programmers are
intimately familiar with.
All of these different virtual resources exist for the same reason: as I said
earlier, if two programs need the same resource to function properly, and they
both try to use it without coordinating, they'll both break horribly.
UNIX-like operating systems more or less virtualize RAM correctly. When one
program grabs some RAM, nobody else - modulo super-powered administrative
debugging tools - gets to use it without talking to that program. It's
extremely clear which memory belongs to which process. If programs want to use
shared memory, there is
a
very specific, opt-in protocol
for doing so; it is basically impossible for it to happen by accident.
However, the abstractions we use for disks (filesystems) and network cards
(listening ports and addresses) are significantly more limited. Every program
on the computer sees the same file-system. The program itself, and the data
the program stores, both live on the same file-system. Every program on the
computer can see the same network information, can query everything about it,
and can receive arbitrary connections. Permissions can remove certain parts of
the filesystem from view (i.e. programs can opt-out) but it is far less clear
which program "owns" certain parts of the filesystem; access must be carefully
controlled, and sometimes mediated by administrators.
In particular, the way that UNIX manages filesystems creates an environment
where "installing" a program requires manipulating state in the same place (the
filesystem) where other programs might require different state. Popular
package managers on UNIX-like systems (APT, RPM, and so on) rarely have a way
to separate program installation even by convention, let alone by strict
enforcement. If you want to do that, you have to re-compile the software with
./configure --prefix
to hard-code a new location. And, fundamentally, this
is why the package managers don't support installing to a different place: if
the program can tell the difference between different installation locations,
then it will, because its developers thought it should go in one place on the
file system, and why not hard code it? It works on their machine.
In order to address this shortcoming of the UNIX process model, the concept of
"virtualization" became popular. The idea of virtualization is simple: you
write a program which emulates an entire computer, with its own storage media,
network devices, and then you install an operating system on it. This
completely resolves the over-sharing of resources: a process inside a virtual
machine is in a very real sense running on a different computer than programs
running on a different virtual machine on the same physical device.
However, virtualiztion is also an extremly heavy-weight blunt instrument.
Since virtual machines are running operating systems designed for physical
machines, they have tons of redundant hardware-management code; enormous
amounts of operating system data which could be shared with the host, but since
it's in the form of a disk image totally managed by the virtual machine's
operating system, the host can't really peek inside to optimize anything. It
also makes other kinds of intentional resource sharing very hard: any software
to manage the host needs to be installed on the host, since if it is installed
on the guest it won't have full access to the host's hardware.
I hate using the term "heavy-weight" when I'm talking about software - it's
often bandied about as a content-free criticism - but the difference in
overhead between running a virtual machine and a process is the difference
between gigabytes and kilobytes; somewhere between 4-6 orders of magnitude.
That's a huge difference.
This means that you need to treat virtual machines as multi-purpose, since one
VM is too big to run just a single small program. Which means you often have
to manage them almost as if they were physical harware.
When we run a program on a UNIX-like operating system, and by so running it,
grant it its very own address space, we call the entity that we just created a
"process".
This is how to understand a "container".
A "container" is what we get when we run a program and give it not just its own
memory, but its own whole virtual filesystem and its own whole virtual
network card.
The metaphor to processes isn't perfect, because a container can contain
multiple processes with different memory spaces that share a single
filesystem. But this is also where some of the "container ecosystem" fervor
begins to creep in - this is why people interested in containers will
religiously exhort you to treat a container as a single application, not to run
multiple things inside it, not to SSH into it, and so on. This is because the
whole point of containers is that they are lightweight - far closer in
overhead to the size of a process than that of a virtual machine.
A process inside a container, if it queries the operating system, will see a
computer where only it is running, where it owns the entire filesystem, and
where any mounted disks were explicitly put there by the administrator who ran
the container. In other words, if it wants to share data with another
application, it has to be given the shared data; opt-in, not opt-out, the
same way that memory-sharing is opt-in in a UNIX-like system.
So why is this so exciting?
In a sense, it really is just a lower-overhead way to run a virtual machine, as
long as it shares the same kernel. That's not super exciting, by itself.
The reason that containers are more exciting than processes is the same reason
that using a filesystem is more exciting than having to use a whole disk:
sharing state always, inevitably, leads to brokenness. Opt-in is better than
opt-out.
When you give a program a whole filesystem to itself, sharing any data
explicitly, you eliminate even the possibility that some other program
scribbling on a shared area of the filesystem might break it. You don't need
package managers any more, only package installers; by removing the other
functions of package managers (inventory, removal) they can be radically
simplified, and less complexity means less brokenness.
When you give a program an entire network address to itself, exposing any ports
explicitly, you eliminate even the possibility that some rogue program will
expose a security hole by listening on a port you weren't expecting. You
eliminate the possibility that it might clash with other programs on the same
host, hard-coding the same port numbers or auto-discovering the same addresses.
In addition to the exciting things on the run-time side, containers - or
rather, the things you run to get containers, "images", present some compelling
improvements to the build-time side.
On Linux and Windows, building a software artifact for distribution to
end-users can be quite challenging. It's challenging because it's not clear
how to specify that you depend on certain other software being installed; it's
not clear what to do if you have conflicting versions of that software that may
not be the same as the versions already available on the user's computer. It's
not clear where to put things on the filesystem. On Linux, this often just
means getting all of your software from your operating system distributor.
You'll notice I said "Linux and Windows"; not the usual (linux, windows, mac)
big-3 desktop platforms, and I didn't say anything about mobile OSes. That's
because on macOS, Android, iOS, and Windows Metro, applications already run in
their own containers. The rules of macOS containers are a bit weird, and very
different from Docker containers, but if you have a Mac you can check out
~/Library/Containers to see the view of the world that the applications you're
running can see. iOS looks much the same.
This is something that doesn't get discussed a lot in the container ecosystem,
partially because everyone is developing technology at such a breakneck pace,
but in many ways Linux server-side containerization is just a continuation of a
trend that started on mainframe operating systems in the 1970s and has already
been picked up in full force by mobile operating systems.
When one builds an image, one is building a picture of the entire filesystem
that the container will see, so an image is a complete artifact. By
contrast, a package for a Linux package manager is just a fragment of a
program, leaving out all of its dependencies, to be integrated later. If an
image runs on your machine, it will (except in some extremely unusual
circumstances) run on the target machine, because everything it needs to run is
fully included.
Because you build all the software an image requires into the image itself,
there are some implications for server management. You no longer need to apply
security updates to a machine - they get applied to one application at a
time, and they get applied as a normal process of deploying new code. Since
there's only one update process, which is "delete the old container, run a new
one with a new image", updates can roll out much faster, because you can build
an image, run tests for the image with the security updates applied, and be
confident that it won't break anything. No more scheduling maintenance
windows, or managing reboots (at least for security updates to applications and
libraries; kernel updates are a different kettle of fish).
That's why it's exciting. So why's it all so confusing?
Fundamentally the confusion is caused by there just being way too many tools.
Why so many tools? Once you've accepted that your software should live in
images, none of the old tools work any more. Almost every administrative,
monitoring, or management tool for UNIX-like OSes depends intimately upon the
ability to promiscuously share the entire filesystem with every other program
running on it. Containers break these assumptions, and so new tools need to be
built. Nobody really agrees on how those tools should work, and a wide variety
of forces ranging from competitive pressure to personality conflicts make it
difficult for the panoply of container vendors to collaborate perfectly.
Many companies whose core business has nothing to do with infrastructure have
gone through this reasoning process:
- Containers are so much better than processes, we need to start using them
right away, even if there's some tooling pain in adopting them.
- The old tools don't work.
- The new tools from the tool vendors aren't ready.
- The new tools from the community don't work for our use-case.
- Time to write our own tool, just for our use-case and nobody else's!
(Which causes problem #3 for somebody else, of course...)
A less fundamental reason is too much focus on scale. If you're running a
small-scale web application which has a stable user-base that you don't expect
a lot of growth in, there are many great reasons to adopt containers as opposed
to automating your operations; and in fact, if you keep things simple, the very
fact that your software runs in a container might obviate the need for a
system-management solution like Chef, Ansible, Puppet, or Salt. You should
totally adopt them and try to ignore the more complex and involved parts of
running an orchestration system.
However, containers are even more useful at significant scale, which means
that companies which have significant scaling problems invest in containers
heavily and write about them prolifically. Many guides and tutorials on
containers assume that you expect to be running a multi-million-node cluster
with fully automated continuous deployment, blue-green zero-downtime deploys, a
1000-person operations team. It's great if you've got all that stuff, but
building each of those components is a non-trivial investment.
So, where does that leave you, my dear reader?
You should absolutely be adopting "container technology", which is to say,
you should probably at least be using Docker to build
your software. But there are other, radically different container systems -
like Sandstorm - which might make sense for you,
depending on what kind of services you create. And of course there's a huge
ecosystem of other tools you might want to use; too many to mention, although I
will shout out to my own employer's
docker-as-a-service Carina, which delivered this blog
post, among other things, to you.
You shouldn't feel as though you need to do containers absolutely "the right
way", or that the value of containerization is derived from adopting every
single tool that you can all at once. The value of containers comes from four
very simple things:
- It reduces the overhead and increases the performance of co-locating
multiple applications on the same hardware,
- It forces you to explicitly call out any shared state or required
resources,
- It creates a complete build pipeline that results in a software artifact
that can be run without special installation or set-up instructions (at
least, on the "software installation" side; you still might require
configuration, of course), and
- It gives you a way to test exactly what you're deploying.
These benefits can combine and interact in surprising and interesting ways, and
can be enhanced with a wide and growing variety of tools. But underneath all
the hype and the buzz, the very real benefit of containerization is basically
just that it is fixing a very old design flaw in UNIX.
Containers let you share less state,
and
shared mutable state is the root of all evil.