The “Dependency Cutout” Workflow Pattern, Part I

It’s important to be able to fix bugs in your open source dependencies, and not just work around them.

Tell me if you’ve heard this one before.

You’re working on an application. Let’s call it “FooApp”. FooApp has a dependency on an open source library, let’s call it “LibBar”. You find a bug in LibBar that affects FooApp.

To envisage the best possible version of this scenario, let’s say you actively like LibBar, both technically and socially. You’ve contributed to it in the past. But this bug is causing production issues in FooApp today, and LibBar’s release schedule is quarterly. FooApp is your job; LibBar is (at best) your hobby. Blocking on the full upstream contribution cycle and waiting for a release is an absolute non-starter.

What do you do?

There are a few common reactions to this type of scenario, all of which are bad options.

I will enumerate them specifically here, because I suspect that some of them may resonate with many readers:

  1. Find an alternative to LibBar, and switch to it.

    This is a bad idea because a transition to a core infrastructure component could be extremely expensive.

  2. Vendor LibBar into your codebase and fix your vendored version.

    This is a bad idea because carrying this one fix now requires you to maintain all the tooling associated with a monorepo1: you have to be able to start pulling in new versions from LibBar regularly, reconcile your changes even though you now have a separate version history on your imported version, and so on.

  3. Monkey-patch LibBar to include your fix.

    This is a bad idea because you are now extremely tightly coupled to a specific version of LibBar. By modifying LibBar internally like this, you’re inherently violating its compatibility contract, in a way which is going to be extremely difficult to test. You can test this change, of course, but as LibBar changes, you will need to replicate any relevant portions of its test suite (which may be its entire test suite) in FooApp. Lots of potential duplication of effort there.

  4. Implement a workaround in your own code, rather than fixing it.

    This is a bad idea because you are distorting the responsibility for correct behavior. LibBar is supposed to do LibBar’s job, and unless you have a full wrapper for it in your own codebase, other engineers (including “yourself, personally”) might later forget to go through the alternate, workaround codepath, and invoke the buggy LibBar behavior again in some new place.

  5. Implement the fix upstream in LibBar anyway, because that’s the Right Thing To Do, and burn credibility with management while you anxiously wait for a release with the bug in production.

    This is a bad idea because you are betraying your users — by allowing the buggy behavior to persist — for the workflow convenience of your dependency providers. Your users are probably giving you money, and trusting you with their data. This means you have both ethical and economic obligations to consider their interests.

    As much as it’s nice to participate in the open source community and take on an appropriate level of burden to maintain the commons, this cannot sustainably be at the explicit expense of the population you serve directly.

    Even if we only care about the open source maintainers here, there’s still a problem: as you are likely to come under immediate pressure to ship your changes, you will inevitably relay at least a bit of that stress to the maintainers. Even if you try to be exceedingly polite, the maintainers will know that you are coming under fire for not having shipped the fix yet, and are likely to feel an even greater burden of obligation to ship your code fast.

    Much as it’s good to contribute the fix, it’s not great to put this on the maintainers.

The respective incentive structures of software development — specifically, of corporate application development and open source infrastructure development — make options 1-4 very common.

On the corporate / application side, these issues are:

  • it’s difficult for corporate developers to get clearance to spend even small amounts of their work hours on upstream open source projects, but clearance to spend time on the project they actually work on is implicit. If it takes 3 hours of wrangling with Legal2 and 3 hours of implementation work to fix the issue in LibBar, but 0 hours of wrangling with Legal and 40 hours of implementation work in FooApp, a FooApp developer will often perceive it as “easier” to fix the issue downstream.

  • it’s difficult for corporate developers to get clearance from management to spend even small amounts of money sponsoring upstream reviewers, so even if they can find the time to contribute the fix, chances are high that it will remain stuck in review unless they are personally well-integrated members of the LibBar development team already.

  • even assuming there’s zero pressure whatsoever to avoid open sourcing the upstream changes, there’s still the fact inherent to any development team that FooApp’s developers will be more familiar with FooApp’s codebase and development processes than they are with LibBar’s. It’s just easier to work there, even if all other things are equal.

  • systems for tracking risk from open source dependencies often lack visibility into vendoring, particularly if you’re doing a hybrid approach and only vendoring a few things to address work in progress, rather than a comprehensive and disciplined approach to a monorepo. If you fully absorb a vendored dependency and then modify it, Dependabot isn’t going to tell you that a new version is available any more, because it won’t be present in your dependency list. Organizationally this is bad of course but from the perspective of an individual developer this manifests mostly as fewer annoying emails.

But there are problems on the open source side as well. Those problems are all derived from one big issue: because we’re often working with relatively small sums of money, it’s hard for upstream open source developers to consume either money or patches from application developers. It’s nice to say that you should contribute money to your dependencies, and you absolutely should, but the cost-benefit function is discontinuous. Before a project reaches the fiscal threshold where it can be at least one person’s full-time job to worry about this stuff, there’s often no-one responsible in the first place. Developers will therefore gravitate to the issues that are either fun, or relevant to their own job.

These mutually-reinforcing incentive structures are a big reason that users of open source infrastructure, even teams who work at corporate users with zillions of dollars, don’t reliably contribute back.

The Answer We Want

All those options are bad. If we had a good option, what would it look like?

It is both practically necessary3 and morally required4 for you to have a way to temporarily rely on a modified version of an open source dependency, without permanently diverging.

Below, I will describe a desirable abstract workflow for achieving this goal.

Step 0: Report the Problem

Before you get started with any of these other steps, write up a clear description of the problem and report it to the project as an issue; specifically, in contrast to writing it up as a pull request. Describe the problem before submitting a solution.

You may not be able to wait for a volunteer-run open source project to respond to your request, but you should at least tell the project what you’re planning on doing.

If you don’t hear back from them at all, you will have at least made sure to comprehensively describe your issue and strategy beforehand, which will provide some clarity and focus to your changes.

If you do hear back from them, in the worst case scenario, you may discover that a hard fork will be necessary because they don’t consider your issue valid, but even that information will save you time, if you know it before you get started. In the best case, you may get a reply from the project telling you that you’ve misunderstood its functionality and that there is already a configuration parameter or usage pattern that will resolve your problems with no new code. But in all cases, you will benefit from early coordination on what needs fixing before you get to how to fix it.

Step 1: Source Code and CI Setup

Fork the source code for your upstream dependency to a writable location where it can live at least for the duration of this one bug-fix, and possibly for the duration of your application’s use of the dependency. After all, you might want to fix more than one bug in LibBar.

You want to have a place where you can put your edits, that will be version controlled and code reviewed according to your normal development process. This probably means you’ll need to have your own main branch that diverges from your upstream’s main branch.

Remember: you’re going to need to deploy this to your production, so testing gates that your upstream only applies to final releases of LibBar will need to be applied to every commit here.

Depending on your LibBar’s own development process, this may result in slightly unusual configurations where, for example, your fixes are written against the last LibBar release tag, rather than its current5 main; if the project has a branch-freshness requirement, you might need two branches, one for your upstream PR (based on main) and one for your own use (based on the release branch with your changes).

Ideally for projects with really good CI and a strong “keep main release-ready at all times” policy, you can deploy straight from a development branch, but it’s good to take a moment to consider this before you get started. It’s usually easier to rebase changes from an older HEAD onto a newer one than it is to go backwards.

Speaking of CI, you will want to have your own CI system. The fact that GitHub Actions has become a de-facto lingua franca of continuous integration means that this step may be quite simple, and your forked repo can just run its own instance.

Optional Bonus Step 1a: Artifact Management

If you have an in-house artifact repository, you should set that up for your dependency too, and upload your own build artifacts to it. You can often treat your modified dependency as an extension of your own source tree and install from a GitHub URL, but if you’ve already gone to the trouble of having an in-house package repository, you can pretend you’ve taken over maintenance of the upstream package temporarily (which you kind of have) and leverage those workflows for caching and build-time savings as you would with any other internal repo.

Step 2: Do The Fix

Now that you’ve got somewhere to edit LibBar’s code, you will want to actually fix the bug.

Step 2a: Local Filesystem Setup

Before you have a production version on your own deployed branch, you’ll want to test locally, which means having both repositories in a single integrated development environment.

At this point, you will want to have a local filesystem reference to your LibBar dependency, so that you can make real-time edits, without going through a slow cycle of pushing to a branch in your LibBar fork, pushing to a FooApp branch, and waiting for all of CI to run on both.

This is useful in both directions: as you prepare the FooApp branch that makes any necessary updates on that end, you’ll want to make sure that FooApp can exercise the LibBar fix in any integration tests. As you work on the LibBar fix itself, you’ll also want to be able to use FooApp to exercise the code and see if you’ve missed anything - and this, you wouldn’t get in CI, since LibBar can’t depend on FooApp itself.

In short, you want to be able to treat both projects as an integrated development environment, with support from your usual testing and debugging tools, just as much as you want your deployment output to be an integrated artifact.

Step 2b: Branch Setup for PR

However, for continuous integration to work, you will also need to have a remote resource reference of some kind from FooApp’s branch to LibBar. You will need 2 pull requests: the first to land your LibBar changes to your internal LibBar fork and make sure it’s passing its own tests, and then a second PR to switch your LibBar dependency from the public repository to your internal fork.

At this step it is very important to ensure that there is an issue filed on your own internal backlog to drop your LibBar fork. You do not want to lose track of this work; it is technical debt that must be addressed.

Until it’s addressed, automated tools like Dependabot will not be able to apply security updates to LibBar for you; you’re going to need to manually integrate every upstream change. This type of work is itself very easy to drop or lose track of, so you might just end up stuck on a vulnerable version.

Step 3: Deploy Internally

Now that you’re confident that the fix will work, and that your temporarily-internally-maintained version of LibBar isn’t going to break anything on your site, it’s time to deploy.

Some deployment heritage should help to provide some evidence that your fix is ready to land in LibBar, but at the next step, please remember that your production environment isn’t necessarily emblematic of that of all LibBar users.

Step 4: Propose Externally

You’ve got the fix, you’ve tested the fix, you’ve got the fix in your own production, you’ve told upstream you want to send them some changes. Now, it’s time to make the pull request.

You’re likely going to get some feedback on the PR, even if you think it’s already ready to go; as I said, despite having been proven in your production environment, you may get feedback about additional concerns from other users that you’ll need to address before LibBar’s maintainers can land it.

As you process the feedback, make sure that each new iteration of your branch gets re-deployed to your own production. It would be a huge bummer to go through all this trouble, and then end up unable to deploy the next publicly released version of LibBar within FooApp because you forgot to test that your responses to feedback still worked on your own environment.

Step 4a: Hurry Up And Wait

If you’re lucky, upstream will land your changes to LibBar. But, there’s still no release version available. Here, you’ll have to stay in a holding pattern until upstream can finalize the release on their end.

Depending on some particulars, it might make sense at this point to archive your internal LibBar repository and move your pinned release version to a git hash of the LibBar version where your fix landed, in their repository.

Before you do this, check in with the LibBar core team and make sure that they understand that’s what you’re doing and they don’t have any wacky workflows which may involve rebasing or eliding that commit as part of their release process.

Step 5: Unwind Everything

Finally, you eventually want to stop carrying any patches and move back to an official released version that integrates your fix.

You want to do this because this is what the upstream will expect when you are reporting bugs. Part of the benefit of using open source is benefiting from the collective work to do bug-fixes and such, so you don’t want to be stuck off on a pinned git hash that the developers do not support for anyone else.

As I said in step 2b6, make sure to maintain a tracking task for doing this work, because leaving this sort of relatively easy-to-clean-up technical debt lying around is something that can potentially create a lot of aggravation for no particular benefit. Make sure to put your internal LibBar repository into an appropriate state at this point as well.

Up Next

This is part 1 of a 2-part series. In part 2, I will explore in depth how to execute this workflow specifically for Python packages, using some popular tools. I’ll discuss my own workflow, standards like PEP 517 and pyproject.toml, and of course, by the popular demand that I just know will come, uv.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor!


  1. if you already have all the tooling associated with a monorepo, including the ability to manage divergence and reintegrate patches with upstream, you already have the higher-overhead version of the workflow I am going to propose, so, never mind. but chances are you don’t have that, very few companies do. 

  2. In any business where one must wrangle with Legal, 3 hours is a wildly optimistic estimate. 

  3. c.f. @mcc@mastodon.social 

  4. c.f. @geofft@mastodon.social 

  5. In an ideal world every project would keep its main branch ready to release at all times, no matter what but we do not live in an ideal world. 

  6. In this case, there is no question. It’s 2b only, no not-2b. 

Against Innovation Tokens

The “innovation token” model for selecting technologies is bad, and here’s why.

Updated 2024-07-04: After some discussion, added an epilogue going into more detail about the value of the distinction between the two types of tokens.

In 2015, Dan McKinley laid out a model for software teams selecting technologies. He proposed that each team have a limited supply of “innovation tokens”, and, when selecting a technology, they can choose boring ones for free but “innovative” ones cost a token. This implies that we all know which technologies are innovative, and we assume that they are inherently costly, so we want to restrict their supply.

That model has become popular to the point that it is now part of the vernacular. In many discussions, it is accepted as received wisdom, or even common sense.

In this post I aim to show you that despite being superficially helpful, this model is wrong, and in fact, may be counterproductive. I believe it is an attractive nuisance in computer programming discourse.

In fairness to Mr. McKinley, the model he described in this post is:

  1. nearly a decade old at this point, and
  2. much more nuanced in its description of the problem with “innovation” than the subsequent memetic mutation of the concept.

While I will be referencing McKinley’s post, and I do take some issue with it, I am reacting more strongly to the life of its own that this idea has taken on once it escaped its original context. There are a zillion worse posts rehashing this concept, on blogs and LinkedIn, but I won’t be linking to them because the goal is not to call anybody out.

To some extent I am re-raising McKinley’s own caveats and reinforcing them. So I may be arguing with a strawman, but it’s a strawman I have seen deployed with some regularity over the years.

To reduce it to its core, this strawman is “don’t use new or interesting technology, and if you have to, only use a little bit”.


Within the broader culture of programmers, an “innovation token” has become a shorthand to smear any technology perceived — almost always based on vibes, not data — as risky, and the adoption of novel approaches as pretentious and unserious. Speaking of programmer culture though, I do have to acknowledge there is also a pervasive tendency for us to get distracted by novelty and waste time on puzzles rather than problem-solving, so I understand where the reactionary attitude represented by the concept of an innovation token comes from.

But it is reactionary.

At its worst, it borders on anti-intellectualism. I have heard it used on more than one occasion as a thought-terminating cliche to discard a potentially promising new tool. But before I get into that, let me try to give a sympathetic summary of the idea, because the model is not entirely bad.

It has been popular for a long time because it does work okay as an heuristic.


The real problem that McKinley is describing is operational overhead. When programmers make a technology selection, we are often considering how difficult it will make the programming. Innovative technology selections are, by definition, less mature.

That lack of maturity — particularly in the open source world — often means that the project is in a part of its lifecycle where it is concerned with development affordances more than operational ones. Therefore, the stereotypical innovative project, even one which might legitimately be a big improvement to development velocity, will create more operational overhead. That operational overhead creates a hidden cost for the operations team later on.

This is a point I emphatically agree with. When selecting a technology, you should consider its ease of operation more than its ease of development. If your team is successful, they will be operating and maintaining it far longer than they are initially integrating and deploying it.

Furthermore, some operational overhead is inevitable. You will need to hire people to mitigate it. More popular, more mature projects will have a bigger talent pool to hire from, so your training costs will be lower, and those training costs are part of your operational cost too.

Rationing innovation tokens therefore can work as a reasonable heuristic, or proxy metric, for avoiding a mess of complex operational problems associated with dependencies that are expensive to operate and hard to hire for.


There are some minor issues I want to point out before getting to the overarching one.

  1. “has a lot of operational overhead” is a stereotype of a new technology, not an inherent property. If you want to reject a technology on the basis of being too high-overhead, at least look into its actual overhead a little bit. Sometimes, especially in 2024 as opposed to 2015, the point of a new, shiny piece of tech is to address operational issues that the more boring, older one had.
  2. “hard to learn” is also a stereotype; if “newer” meant “harder” then we would all be using troff rather than Google Docs. Actually ask if the innovativeness is making things harder or easier; don’t assume.
  3. You are going to have to train people on your stack no matter what. If a technology is adding a lot of value, it’s absolutely worth hiring for general ability and making a plan to teach people about it. You are going to have to do this with the core technology of your product anyway.

As I said, though, these are minor issues. The big problem with modeling operational overhead as an “innovation token” is that an even bigger concern than selecting an innovative tool is selecting too many tools.


The impulse to select more tools and make your operational environment more complex can be made worse by trying to avoid innovative tools. The important thing is not “less innovation”, but more consistency. To illustrate this, let’s do a simple thought experiment.

Let’s say you’re going to make a web app. There’s a tool in Haskell that you really like for a critical part of your app’s problem domain. You don’t want to spend more than one innovation token though, and everything in Haskell is inherently innovative, so you write a little service that just does that one part and you write the rest of your app in Ruby, calling into that service whenever you need to use that thing. This will appropriately restrict your “innovation token” expenditure.

Does doing this actually reduce your operational overhead, though?

First, you will have to find a team that likes both Ruby and Haskell and sees no problem using both. If you are not familiar with the cultural proclivities of these languages, suffice it to say that this is unlikely. Hiring for Haskell programmers is hard because there are fewer of them than Ruby programmers, but hiring for polyglot Haskell/Ruby programmers who are happy to do either is going to be really hard.

Since you will need to find different people to write in the different languages, even in the best case scenario, you will have two teams: the Haskell team and the Ruby team. Even if you are incredibly disciplined about inter-service responsibilities, there will be some areas where duplication of code is necessary across those services. Disagreements will arise and every one of these disagreements will be a source of social friction and software defects.

Then, you need to set up separate CI pipelines for each language, separate deployment systems, and of course, separate databases. Right away you are effectively doubling your workload.

In the worse, and unfortunately more likely scenario, there will be enormous infighting between these two teams. Operational incidents will be more difficult to manage because rather than learning the Haskell tools for operational visibility and disseminating that institutional knowledge amongst your team, you will be half-learning the lessons from two separate ecosystems and attempting to integrate them. Every on-call engineer will be frantically trying to learn a language ecosystem they don’t use regularly, or you will double the size of your on-call rotation. The Ruby team may start to resent the Haskell team for getting to exclusively work on the fun parts of the problem while they are doing things that look more like rote grunt work.


A better way to think about the problem of managing operational overhead is, rather than “innovation tokens”, consider “boundary tokens”.

That is to say, rather than evaluating the general sense of weird vibes from your architecture, consider the consistency of that architecture. If you’re using Haskell, use Haskell. You should be all-in on Haskell web frameworks, Haskell ORMs, Haskell OAuth integrations, and so on.1 To cross the boundary out of Haskell, you need to spend a boundary token, and you shouldn’t have many of those.

I submit that the increased operational overhead that you might experience with an all-Haskell tool selection will be dwarfed by the savings that you get by having a team that is aligned with each other, that can communicate easily, and that can share programs with each other without needing to first strategize about a channel for the two pieces of work to establish bidirectional communication. The ability to simply call a function when you need to call it is very powerful, and extremely underrated.

Consistency ought to apply at each layer of the stack; it is perhaps most obvious with programming languages, but it is true of web frameworks, test frameworks, cryptographic libraries, you name it. Make a choice and stick with it, because every deviation from that choice carries a significant cost. Moreover this cost is a hidden cost, in the same way that the operational downsides of an “innovative” tool that hasn’t seen much production use might be hidden.

Discarding a more standard tool in favor of a tool more consistent with your architecture extends even to fairly uncontroversial, ubiquitous tools. For example, one of my favorite architectural patterns is to forego the use of the venerable — and very boring – Cron, the UNIX task-scheduler. Instead of Cron, it can make a lot of sense to have hand-written bespoke code for scheduling tasks within the application. Within the “innovation tokens” model, this is a very silly waste of a token!


Just use Cron! Everybody knows how to use Cron!

Except… does everybody know how to use Cron? Here are some questions to consider, if you’re about to roll out a big dependency on Cron:

  1. How do you write a unit test for a scheduling rule with Cron?
  2. Can you even remember how to write a cron rule that runs at the times you want?
  3. How do you inject secrets and configuration variables into the distinct and somewhat idiosyncratic runtime execution environment of Cron?
  4. How do you know that you did that variable-injection properly until the job actually runs, possibly in the middle of the night?
  5. How do you deploy your monitoring and error-logging frameworks to observe your scripts run under Cron?

Granted, this architectural choice is less controversial than it once was. Cron used to be ambiently available on whatever servers you happened to be running. As container-based deployments have increased in popularity, this sense that Cron is just kinda around has gone away, and if you need to run a container that just runs Cron, much of the jankiness of its deployment is a lot more immediately visible.


There is friction at the boundary between things. That friction is a cost, but sometimes it’s a cost worth paying.

If there’s a really good library in Haskell and a really good library in Ruby and you really do want to use them both, maybe it makes sense to actually have multiple services. As your team gets larger and more mature, the need to bring in more tools, and the ability to handle the associated overhead, will only increase over time. But the place that the cost comes in the most is at the boundary between tools, not in the operational deficiencies of any one particular tool.

Even in a bog-standard web application with the most boring, least innovative tech stack imaginable (PHP, MySQL, HTML, CSS, JavaScript), many of the annoying points of friction are where different, inconsistent technologies make contact. If you are a programmer working on the web yourself, consider your own impression of the level of controversy of these technologies:

Consider that there are far more complex technical tools in terms of required skills to implement them, like computer vision or physics simulation, tools which are also pretty widely used, which consistently generate lower levels of controversy. People do have strong feelings about these things as well, of course, and it’s hard to find things to link to that show “this isn’t controversial”, but, like, search your feelings, you know it to be true.


You can see the benefits of the boundary token approach in programming language design. Many of the most influential and best-loved programming languages had an impact not by bundling together lots of tools, but by making everything into one thing:

  • LISP: everything is a list
  • Smalltalk: everything is an object
  • ML: everything is an algebraic data type
  • Forth: everything is a stack

There is a tremendous power in thinking about everything as a single kind of thing, because then you don’t have to juggle lots of different ideas about different kinds of things; you can just think about your problem.

When people complain about programming languages, they’re often complaining about how many different kinds of thing they have to remember in order to use it.

If you keep your boundary-token budget small, and allow your developers to accomplish as much as possible while staying within a solution space delineated by a single, clean cognitive boundary, I promise you can innovate as much as you want and your operational costs will remain manageable.


Epilogue

In subsequent Mastodon discussion of this post on with Matt Campbell and Meejah, I realized that I may not have made it entirely clear why I feel the distinction between “boundary” and “innovation” tokens is important. I do say above that the “innovation token” model can be a useful heuristic, so why bother with a new, but slightly different heuristic? Especially since most experienced engineers - indeed, McKinley himself - would budget “innovation” quite similarly to “boundaries”, and might even consider the use of more “innovative” Haskell tools in my hypothetical scenario to not even be an expenditure of innovation tokens at all.

To answer that, I need to highlight the purpose of having heuristics like this in the first place. These are vague, nebulous guidelines, not hard and fast rules. I cannot give you a token calculator to plug your technical decisions into. The purpose of either token heuristic is to facilitate discussions among a team.

With a team of skilled and experienced engineers, the distinction is meaningless. Senior and staff engineers (at least, the ones who deserve their level) will intuit the goals behind “innovation tokens” and inherently consider things like operational overhead anyway. In practice, a high-performing, well-aligned team discussing innovation tokens and one discussing boundary tokens will look functionally indistinguishable.

The distinction starts to be important when you have management pressures, nervous executives, inexperienced engineers, a fresh team without existing consensus about core technology choices, and so on. That is to say, most teams that exist in the messy, perpetually in medias res world of the software industry.

If you are just getting started on a project and you have a bunch of competent but disagreeable engineers, the words “innovation” and “boundaries” function very differently.

If you ask, “is this an innovation” about a particular technical tool, you are asking your interlocutor to pull in a bunch of their skills and experience to subjectively evaluate the relative industry-wide, or maybe company-wide, or maybe team-wide2 newness of the thing being discussed. The discussion of whether it counts as boring or innovative is immediately fraught with a ton of subjective, difficult-to-quantify information about costs of hiring, difficulty of learning, and your impression of the feelings of hundreds or thousands of people outside of your team. And, yes, ultimately you do need to have an estimate of all that stuff, but starting your “is it OK to use this” conversation by simultaneously arguing about all those subjective judgments is setting yourself up for failure.

Instead, if you ask “does this introduce a boundary between two different technologies with different conceptual models”, while that is not a perfectly objective question, it is much easier for your team to answer, with much crisper intermediary factual questions. What are the two technologies? What are the models? How much do they differ? You can just hash out the answers to each one within the team directly, rather than needing to sift through the last few years of Stack Overflow developer surveys to determine relative adoption or popularity of technologies in the world at large.

Restricting your supply of either boundary or innovation tokens is a good idea, but achieving unanimity within your team about what your boundaries are is always going to be easier than deciding what your innovations are.


Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics like “how can we make our architecture more consistent”.


  1. I gave a talk about this once, a very long time ago, where Haskell was Python. 

  2. It’s not clear, that’s a big part of the problem. 

Tips And Tricks for Shipping a PyGame App on the Mac

A quick and dirty guide to getting that little PyGame hack you did up and running on someone else’s Mac.

I have written a tool you can actually use rather than copying and pasting shell-script snippets, which you can read about in a new post here. I've done my best to update the accuracy of the information below as well, particularly with respect to which Python you want and why, but it is a much older post and I could easily have missed something.

I’ve written and spoken at some length about shipping software in the abstract. Sometimes I’ve even had the occasional concrete tidbit, but that advice wasn’t really complete.

In honor of Eevee’s delightful Games Made Quick???, I’d like to help you package your games even quicker than you made them.

Who is this for?

About ten years ago I made a prototype of a little PyGame thing which I wanted to share with a few friends. Building said prototype was quick and fun, and very different from the usual sort of work I do. But then, the project got just big enough that I started to wonder if it would be possible to share the result, and thus began the long winter of my discontent with packaging tools.

I might be the only one, but... I don’t think so. The history of PyWeek, for example, looks to be a history of games distributed as Github repositories, or, at best, apps which don’t launch. It seems like people who participate in game jams with Unity push a button and publish their games to Steam; people who participate in game jams with Python wander away once the build toolchain defeats them.

So: perhaps you’re also a Python programmer, and you’ve built something with PyGame, and you want to put it on your website so your friends can download it. Perhaps many or most of your friends and family are Mac users. Perhaps you tried to make a thing with py2app once, and got nothing but inscrutable tracebacks or corrupt app bundles for your trouble.

If so, read on and enjoy.

What changed?

If things didn’t work for me when I first tried to do this, what’s different now?

  • the packaging ecosystem in general is far less buggy, and py2app’s dependencies, like setuptools, have become far more reliable as well. Many thanks to Donald Stufft and the whole PyPA for that.
  • Binary wheels exist, and the community has been getting better and better at building self-contained wheels which include any necessary C libraries, relieving the burden on application authors to figure out gnarly C toolchain issues.
  • The PyGame project now ships just such wheels for a variety of Python versions on Mac, Windows, and Linux, which removes a whole huge pile of complexity both in generally understanding the C toolchain and specifically understanding the SDL build process.
  • py2app has been actively maintained and many bugs have been fixed - many thanks to Ronald Oussoren et. al. for that.
  • I finally broke down and gave Apple a hundred dollars so I can produce an app that normal humans might actually be able to run.

There are still weird little corner cases you have to work around — hence this post – but mostly this is the story of how years of effort by the Python packaging community have resulted in tools that are pretty close to working out of the box now.

Step 0: Development Setup

You will also want to use a virtual environment for development.

Finally: pip install all your requirements into your virtualenv, including PyGame itself.

Step 1: Make an icon

All good apps need an icon, right?

When I was young, one would open up ResEdit Resorcerer MPW CodeWarrior Project Builder Icon Composer Xcode and create a new ICON resource cicn resource .tiff file .icns file. Nowadays there’s some weird opaque stuff with xcassets files and Contents.json and “Copy Bundle Resources” in the default Swift and Objective C project templates and honestly I can’t be bothered to keep track of what’s going on with this nonsense any more.

Luckily the OS ships with the macOS-specific “scriptable image processing system”, which can helpfully convert an icon for you. Make yourself a 512x512 PNG file in your favorite image editor (with an alpha channel!) that you want to use as your icon, then run it something like this:

1
$ sips -s format icns Icon.png --out Icon.icns

somewhere in your build process, to produce an icon in the appropriate format.

There’s also one additional wrinkle with PyGame: once you’ve launched the game, PyGame helpfully assigns the cute, but ugly, default PyGame icon to your running process. To avoid this, you’ll need these two lines somewhere in your initialization code, somewhere before pygame.display.init (or, for that matter, pygame.display.<anything>):

1
2
from pygame.sdlmain_osx import InstallNSApplication
InstallNSApplication()

Obviously this is pretty Mac-specific so you probably want this under some kind of platform-detection conditional, perhaps this one.

Step 2: Include All The Dang Files, I Don’t Care About Performance

Unfortunately py2app still tries really hard to jam all your code into a .zip file, which breaks the world in various hilarious ways. Your app will probably have some resources you want to load, as will PyGame itself.

Supposedly, packages=["your_package"] in your setup.py should address this, and it comes with a “pygame” recipe, but neither of these things worked for me. Instead, I convinced py2app to splat out all the files by using the not-quite-public “recipe” plugin API:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import py2app.recipes
import py2app.build_app

from setuptools import find_packages, setup

pkgs = find_packages(".")

class recipe_plugin(object):
    @staticmethod
    def check(py2app_cmd, modulegraph):
        local_packages = pkgs[:]
        local_packages += ['pygame']
        return {
            "packages": local_packages,
        }

py2app.recipes.my_recipe = recipe_plugin

APP = ['my_main_file.py']
DATA_FILES = []
OPTIONS = {}
OPTIONS.update(
    iconfile="Icon.icns",
    plist=dict(CFBundleIdentifier='com.example.yourdomain.notmine')
)

setup(
    name="Your Game",
    app=APP,
    data_files=DATA_FILES,
    include_package_data=True,
    options={'py2app': OPTIONS},
    setup_requires=['py2app'],
    packages=pkgs,
    package_data={
        "": ["*.gal" , "*.gif" , "*.html" , "*.jar" , "*.js" , "*.mid" ,
             "*.png" , "*.py" , "*.pyc" , "*.sh" , "*.tmx" , "*.ttf" ,
             # "*.xcf"
        ]
    },
)

This is definitely somewhat less efficient than py2app’s default of stuffing the code into a single zip file, but, as a counterpoint to that: it actually works.

Step 3: Build it

Hopefully, at this point you can do python setup.py py2app and get a shiny new app bundle in dist/$NAME.app. We haven’t had to go through the hell of quarantine yet, so it should launch at this point. If it doesn’t, sorry :-(.

You can often debug more obvious fail-to-launch issues by running the executable in the command line, by running ./dist/$NAME.app/Contents/MacOS/$NAME. Although this will run in a slightly different environment than double clicking (it will have all your shell’s env vars, for example, so if your app needs an env var to work it might mysteriously work there) it will also print out any tracebacks to your terminal, where they’ll be slightly easier to find than in Console.app.

Once your app at least runs locally, it’s time to...

Step 4: Code sign it

All the tutorials that I’ve found on how to do this involve doing Xcode project goop where it’s not clear what’s happening underneath. But despite the fact that the introductory docs aren’t quite there, the underlying model for codesigning stuff is totally common across GUI and command-line cases. However, actually getting your cert requires Xcode, an apple ID, and a credit card.

After paying your hundred dollars, go into Xcode, go to Accounts, hit “+”, “Apple ID”, then log in. Then, in your shiny new account, go to “Manage Certificates”, hit the little “+”, and (assuming, like me, you want to put something up on your own website, and not submit to the Mac App Store), and choose Developer ID Application. You probably think you want “mac app distribution” because you are wanting to distribute a mac app! But you don’t.

Next, before you do anything else, make sure you have backups of your certificate and private key. You really don’t want to lose the private key associated with that cert.

Now quit Xcode; you’re done with the GUI.

You will need to know the identifier of your signing key though, which should be output from the command:

1
$ security find-identity -v -p codesigning | grep 'Developer ID' | sed -e 's/.*"\(.*\)"/\1/'

You probably want to put that in your build script, since you want to sign with the same identity every time. Further commands here will assume you’ve copied one of the lines of results from that command and done export IDENTITY="..." with it.

Step 4a: Become Aware Of New Annoying Requirements

Update for macOS Catalina: In Catalina, Apple has added a new code-signing requirement; even for apps distributed outside of the app store, they still have to be submitted to and approved by Apple.

In order to be notarized, you will need to codesign not only your app itself, but to also:

  1. add the hardened-runtime exception entitlements that allow Python to work, and
  2. directly sign every shared library that is part of your app bundle.

So the actual code-signing step is now a little more complicated.

Step 4b: Write An Entitlements Plist That Allows Python To Work

One of the features that notarization is intended to strongly encourage1 is the “hardened runtime”, a feature of macOS which opts in to stricter run-time behavior designed to stop malware. One thing that the hardened runtime does is to disable writable, executable memory, which is used by JITs, FFIs ... and malware.

Unfortunately, both Python’s built-in ctypes module and various popular bits of 3rd-party stuff that uses cffi, including pyOpenSSL, require writable, executable memory to work. Furthermore, py2app actually imports ctypes during its bootstrapping phase, so you can’t even get your own code to start running to perform any workarounds unless this is enabled. So this is just if you want to use Python, not if your project requires ctypes directly.

To make this long, sad story significantly shorter and happier, you can create an entitlements property list that enables the magical property which allows this to work. It looks like this:

1
2
3
4
5
6
7
8
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>com.apple.security.cs.allow-unsigned-executable-memory</key>
    <true/>
</dict>
</plist>

Subsequent steps assume that you’ve put this into a file called entitleme.plist in your project root.

Step 4c: SIGN ALL THE THINGS

Notarization also requires that all the executable files in your bundle, not just the main executable, are properly code-signed before submitting. So you’ll need to first run the codesign command across all your shared libraries, something like this:

1
2
3
4
5
6
7
8
9
$ cd dist
$ find "${NAME}.app" -iname '*.so' -or -iname '*.dylib' |
    while read libfile; do
        codesign --sign "${IDENTITY}" \
                 --entitlements ../entitleme.plist \
                 --deep "${libfile}" \
                 --force \
                 --options runtime;
    done;

Then finally, sign the bundle itself.

1
2
3
4
5
$ codesign --sign "${IDENTITY}" \
         --entitlements ../entitleme.plist \
         --deep "${NAME}.app" \
         --force \
         --options runtime;

Now, your app is code-signed.

Step 5: Archive it

The right way to do this is probably to use dmgbuild or something like it, but what I promised here was quick and dirty, not beautiful and best practices.

You have to make a Zip archive that preserves symbolic links. There are a couple of options for this:

  • open dist/, then in the Finder window that comes up, right click on the app and “compress” it
  • cd dist; zip -yr $NAME.app.zip $NAME.app

Most importantly, if you use the zip command line tool, you must use the -y option. Without it, your downloadable app bundle will be somewhat mysteriously broken even though the one before you zipped it will be fine.

Step 6: Actually The Rest Of Step 4: Request Notarization

Notarization is a 2-step process, which is somewhat resistant to fully automating. You submit to Apple, then they email you the results of doing the notarization, then if that email indicates that your notarization succeded, you can “staple” the successful result to your bundle.

The thing you notarize is an archive, which is why you need to do step 5 first. Then, you need to do this:

1
2
3
4
5
$ xcrun altool --notarize-app \
      --file "${NAME}.app.zip" \
      --type osx \
      --username "${YOUR_DEVELOPER_ID_EMAIL}" \
      --primary-bundle-id="${YOUR_BUNDLE_ID}";

Be sure that YOUR_BUNDLE_ID matches the CFBundleIdentifier you told py2app about before, so that the tool can find your app bundle inside the archive.

You’ll also need to type in the iCloud password for your Developer ID account here.2

Step 6a: Wait A Minute

Anxiously check your email for an hour or so. Hope you don’t get any errors.

Step 6b: Finish Notarizing It, Finally!

Once Apple has a record of the app’s notarization, their tooling will recognize it, so you don’t need any information from the confirmation email or the previous command; just make sure that you are running this on the exact same .app directory you just built and archived and not a version that differs in any way.

1
$ xcrun stapler staple "./${NAME}.app";

Finally, you will want to archive it again:

1
$ zip -qyr "${NAME}.notarized.app.zip" "${NAME}.app";

Step 7: Download it

Ideally, at this point, everything should be working. But to make sure that code-signing and archiving and notarizing and re-archiving went correctly, you should have either a pristine virtual machine with no dev tools and no Python installed, or a non-programmer friend’s machine that can serve the same purpose. They probably need a relatively recent macOS - my own experience has shown that apps made using the above technique will definitely work on High Sierra (and later) and will definitely break on Yosemite (and earlier); they probably start working at some OS version between those.

There’s no tooling that I know of that can clearly tell you whether your mac app depends on some detail of your local machine. Even for your dependencies, there’s no auditwheel for macOS.

Updated 2019-06-27: It turns out there is an auditwheel like thing for macOS: delocate! In fact, it predated and inspired auditwheel!

Thanks to Nathaniel Smith for the update (which he provided in, uh, January of 2018 and I’ve only just now gotten around to updating...).

Nevertheless, it’s always a good idea to check your final app build on a fresh computer before you announce it.

Coda

If you were expecting to get to the end and download my cool game, sorry to disappoint! It really is a half-broken prototype that is in no way ready for public consumption, and given my current load of personal and professional responsibilities, you definitely shouldn’t expect anything from me in this area any time soon, or, you know, ever.

But, from years of experience, I know that it’s nearly impossible to summon any motivation to work on small projects like this without the knowledge that the end result will be usable in some way, so I hope that this helps someone else set up their Python game-dev pipeline.

I’d really like to turn this into a 3-part series, with a part for Linux (perhaps using flatpak? is that a good thing?) and a part for Windows. However, given my aforementioned time constraints, I don’t think I’m going to have the time or energy to do that research, so if you’ve got the appropriate knowledge, I’d love to host a guest post on this blog, or even just a link to yours.

If this post helped you, if you have questions or corrections, or if you’d like to write the Linux or Windows version of this post, let me know.


  1. The hardened runtime was originally required when notarization was introduced. Apparently this broke too much software and now the requirement is relaxed until January 2020. But it’s probably best to treat it as if it is required, since the requirement is almost certainly coming back, and may in fact be back by the time you’re reading this. 

  2. You can pass it via the --password option but there are all kinds of security issues with that so I wouldn’t recommend it. 

Software You Can Use

The Python community needs a tool for distributing software to end users.

Python has a big problem. While it’s easy and fun to produce software in Python, it’s hard to produce software that people - especially laypeople who are not professional software developers - can use.

In the modern software ecosystem, there are a few places that you might want to use a program:

  1. On a server, by loading a web page.
  2. On a web page, by running it in your browser.
  3. As a command-line tool, on Mac,
  4. ... Windows,
  5. ... or Linux.
  6. As a desktop application, on Mac,
  7. ... Windows,
  8. ... or Linux.
  9. As a mobile application, on iOS,
  10. ... or Android,
  11. ... Or Windows Phone. (Just kidding.)

Out of these 10 scenarios, Python currently has half of a good deployment story for one of them: running an application on a server, as a back-end. This is a serious problem for the future of Python and one we need to figure out how to face as a community.

Even the “good” deployment story is somewhat convoluted, as you need to know about at least some Linux distribution’s package manager, and native dependencies, and Pip, and virtualenv, and wheels, and probably docker too.

If you want to run a Python application in your browser, your best bet right now is probably Brython. However, brython is still in its infancy, and basic faciltiies like preparing your code for production to achieve acceptable start-up performance, and an explanation of how to use libraries are missing. With big chunks like that missing it’s hard to advocate for Brython’s use in production.

Moving on to scenario 3, this may be one of the best-supported configurations; py2app actually works surprisingly well. But it’s still incredibly confusing for new users. Which native objects (dylibs and frameworks and data files) to bundle are options to py2app itself, and not something that can be handled automatically by libraries. So if, for example, PyGame depends on SDL.framework or libSDL.dylib, you as an application developer need to understand how to figure that out and specify that list.

On Windows, the situation gets worse. To work as a Windows exectuable, you need to bundle the Python interpreter, but unlike in an OS X application, you can’t just copy in a whole directory. So you end up needing a tool like PyInstaller or cx_Freeze. PyInstaller hasn’t seen a release in the last 2 years; it doesn’t support Python 3. It also doesn’t work: if I try to package the most basic Twisted program possible, with pyinstaller 2.1 I get “no module named zope.interface”, and if I try to package it with pyinstaller trunk, I get “no module named itertools”. cx_Freeze similarly can’t figure out how to include zope.interface no matter what I tell it to do. This problem isn’t specific to libraries that I use; most Python projects will run into it.

py2exe, on the other hand, only supports Python 3.3+, and so is unusable with a lot of important python libraries.

For a GUI application for Linux, you might have some small hope of building a distro-specific package that users could install, but that would involve using a distro-specific toolchain that had nothing to do with Python, and you need to repeat that work for Debian, Ubuntu, Fedora, and whatever other distros you want to support.

All of these same tools are what I would use to build a stand-alone command-line executable for Windows, Mac, or Linux, and they all break down in similar ways.

In the mobile space, there is absolutely zero tooling included with the language to even get started there. It might be possible to use Kivy to get a build onto iOS or Android. I haven’t had an opportunity to test those. But they still require you to install Homebrew, and a C compiler, and a whole bunch of fairly specific platform tooling to get started, and there are lots of different ways that can go wrong.

So how do other languages stack up?

  1. In JavaScript, if you want an application in the browser, it’s as simple as ... writing some JavaScript.
  2. In JavaScript, if you want a desktop application, you can just grab Electron and be up and running in a few minutes.
  3. In JavaScript, if you want a command-line UNIX tool, you can grab nar and build something self-contained almost immediately.
  4. And of course, in Go, there’s no way to get anything but a fully functional self-contained executable at the end of the build process. Everything is fully redistributable by default.

As a community, Python needs a clear, well-documented, well-supported, modern way to produce build artifacts that are easy to create and easy to share. We need to have this for all popular platforms and the browser. This is a tricky problem: it requires knowledge of lots of fiddly build details.

This wheel has been re-invented, poorly, a dozen or so times. My list above was just a subset. In addition to py2app, py2exe, pyinstaller, cx_Freeze, and the Kivy bundling tools, we’ve also got terrarium, bbfreeze (which is unmaintained), pipsi, pex, and probably some others I don’t know about.

In order to compete with JavaScript and Go for developers’ attention, Python must be able to become an implementation detail and disappear when the user is running the program. This means that some of these tools (terrarium, pipsi, pex) are not suitable for this purpose because they are envelopes for deployment into an environment with an installed Python interpreter.

All of the tools I’m aware of that are trying to provide fully self-contained execution, though (pyinstaller, cx_freeze, bb-freeze, py2app) are poorly designed because they value optimized distribution size over actually working by default. Rather than reading setuptools metadata and discovering the full set of dependencies which have been declared to be required, all of these tools use weird AST-parsing heuristics and buggy path-traversal hacks to try to construct a guess as to the minimal set of files that might be required, then require the poor application developer to fill in the gaps. This means none of them work with namespace packages, none of them work properly with plugin systems or runtime configuration systems; generally, they don’t work correctly with late binding, which is one of Python’s greatest strengths. Of course, a full Python interpreter with the whole standard library is quite large. If we had a tool that worked well but produced very large executables, we could of course start adding an “optimized mode” to try to crunch things down for production.

And all this is to say nothing of the insanely intricate and detailed knowledge that every Python programmer eventually acquires about the C runtime semantics of their chosen platform. When a C compiler is required but missing, most tools still just emit tracebacks. When a shared library goes missing dues to an OS upgrade or package removal, you just see whatever the dynamic linker thinks to report, no explanation of how to fix it or what to do next.

The Python packaging ecosystem has made great strides in the last few years; Pip, in particular, has gone from a buggy and insecure mess to a mostly workable software delivery mechanism for developers. There are still bugs, but they are getting dealt with at a reasonable clip. However, Pip only delivers software to developers, and still requires you to have a Python runtime, a build environment, and tricky command-line tools to get things in place for development. The Python community has effectively no tools to deliver software to users.

To sum up, we need a tool which:

  1. works by default, including with “tricky” packages with namespace packages, data files, and native dependencies
  2. produces useful, actionable error messages when something is missing and the build can’t be completed (like “you don’t have a C compiler installed” or “you need to install Homebrew and then brew install openssl”)
  3. can produce both command-line and GUI executables for the mac, windows, and linux (and, for bonus points, a web browser)

The bad news is that I don’t have the time to start this project myself, and I’m not sure who does. The worse news is that every day we don’t have this, more and more people are re-writing their user-facing tools and applications in JavaScript or Go or Swift or Java, to suit their target platform, because it is honestly easier to learn an entirely new programming language and toolchain, and rewrite an entire application than to figure out how to build a self-contained executable in Python right now.

The good news, though, is that it’s a simple matter of programming, and that all the core technologies for doing all the really hard things that need to be done (pip, and zipimport and macholib, for example) already exist. It’s just a simple matter of programming: wiring together the metadata from setuptools, determining native dependencies with something like otool or ldd (or whatever the equivalent is on Windows, I still haven’t figured that out myself), pulling them all into a bundle, tacking the Python interpreter on.

Deploying Python Applications with Docker - A Suggestion

A template for deploying Python applications into Docker containers.

Deploying python applications is much trickier than it should be.

Docker can simplify this, but even with Docker, there are a lot of nuances around how you package your python application, how you build it, how you pull in your python and non-python dependencies, and how you structure your images.

I would like to share with you a strategy that I have developed for deploying Python apps that deals with a number of these issues. I don’t want to claim that this is the only way to deploy Python apps, or even a particularly right way; in the rapidly evolving containerization ecosystem, new techniques pop up every day, and everyone’s application is different. However, I humbly submit that this process is a good default.

Rather than equivocate further about its abstract goodness, here are some properties of the following container construction idiom:

  1. It reduces build times from a naive “sudo setup.py install” by using Python wheels to cache repeatably built binary artifacts.
  2. It reduces container size by separating build containers from run containers.
  3. It is independent of other tooling, and should work fine with whatever configuration management or container orchestration system you want to use.
  4. It uses existing Python tooling of pip and virtualenv, and therefore doesn’t depend heavily on Docker. A lot of the same concepts apply if you have to build or deploy the same Python code into a non-containerized environment. You can also incrementally migrate towards containerization: if your deploy environment is not containerized, you can still build and test your wheels within a container and get the advantages of containerization there, as long as your base image matches the non-containerized environment you’re deploying to. This means you can quickly upgrade your build and test environments without having to upgrade the host environment on finicky continuous integration hosts, such as Jenkins or Buildbot.

To test these instructions, I used Docker 1.5.0 (via boot2docker, but hopefully that is an irrelevant detail). I also used an Ubuntu 14.04 base image (as you can see in the docker files) but hopefully the concepts should translate to other base images as well.

In order to show how to deploy a sample application, we’ll need a sample application to deploy; to keep it simple, here’s some “hello world” sample code using Klein:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# deployme/__init__.py
from klein import run, route

@route('/')
def home(request):
    request.setHeader("content-type", "text/plain")
    return 'Hello, world!'

def main():
    run("", 8081)

And an accompanying setup.py:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
from setuptools import setup, find_packages

setup (
    name             = "DeployMe",
    version          = "0.1",
    description      = "Example application to be deployed.",
    packages         = find_packages(),
    install_requires = ["twisted>=15.0.0",
                        "klein>=15.0.0",
                        "treq>=15.0.0",
                        "service_identity>=14.0.0"],
    entry_points     = {'console_scripts':
                        ['run-the-app = deployme:main']}
)

Generating certificates is a bit tedious for a simple example like this one, but in a real-life application we are likely to face the deployment issue of native dependencies, so to demonstrate how to deal with that issue, that this setup.py depends on the service_identity module, which pulls in cryptography (which depends on OpenSSL) and its dependency cffi (which depends on libffi).

To get started telling Docker what to do, we’ll need a base image that we can use for both build and run images, to ensure that certain things match up; particularly the native libraries that are used to build against. This also speeds up subsquent builds, by giving a nice common point for caching.

In this base image, we’ll set up:

  1. a Python runtime (PyPy)
  2. the C libraries we need (the libffi6 and openssl ubuntu packages)
  3. a virtual environment in which to do our building and packaging
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# base.docker
FROM ubuntu:trusty

RUN echo "deb http://ppa.launchpad.net/pypy/ppa/ubuntu trusty main" > \
    /etc/apt/sources.list.d/pypy-ppa.list

RUN apt-key adv --keyserver keyserver.ubuntu.com \
                --recv-keys 2862D0785AFACD8C65B23DB0251104D968854915
RUN apt-get update

RUN apt-get install -qyy \
    -o APT::Install-Recommends=false -o APT::Install-Suggests=false \
    python-virtualenv pypy libffi6 openssl

RUN virtualenv -p /usr/bin/pypy /appenv
RUN . /appenv/bin/activate; pip install pip==6.0.8

The apt options APT::Install-Recommends and APT::Install-Suggests are just there to prevent python-virtualenv from pulling in a whole C development toolchain with it; we’ll get to that stuff in the build container. In the run container, which is also based on this base container, we will just use virtualenv and pip for putting the already-built artifacts into the right place. Ubuntu expects that these are purely development tools, which is why it recommends installation of python development tools as well.

You might wonder “why bother with a virtualenv if I’m already in a container”? This is belt-and-suspenders isolation, but you can never have too much isolation.

It’s true that in many cases, perhaps even most, simply installing stuff into the system Python with Pip works fine; however, for more elaborate applications, you may end up wanting to invoke a tool provided by your base container that is implemented in Python, but which requires dependencies managed by the host. By putting things into a virtualenv regardless, we keep the things set up by the base image’s package system tidily separated from the things our application is building, which means that there should be no unforseen interactions, regardless of how complex the application’s usage of Python might be.

Next we need to build the base image, which is accomplished easily enough with a docker command like:

1
$ docker build -t deployme-base -f base.docker .;

Next, we need a container for building our application and its Python dependencies. The dockerfile for that is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# build.docker
FROM deployme-base

RUN apt-get install -qy libffi-dev libssl-dev pypy-dev
RUN . /appenv/bin/activate; \
    pip install wheel

ENV WHEELHOUSE=/wheelhouse
ENV PIP_WHEEL_DIR=/wheelhouse
ENV PIP_FIND_LINKS=/wheelhouse

VOLUME /wheelhouse
VOLUME /application

ENTRYPOINT . /appenv/bin/activate; \
           cd /application; \
           pip wheel .

Breaking this down, we first have it pulling from the base image we just built. Then, we install the development libraries and headers for each of the C-level dependencies we have to work with, as well as PyPy’s development toolchain itself. Then, to get ready to build some wheels, we install the wheel package into the virtualenv we set up in the base image. Note that the wheel package is only necessary for building wheels; the functionality to install them is built in to pip.

Note that we then have two volumes: /wheelhouse, where the wheel output should go, and /application, where the application’s distribution (i.e. the directory containing setup.py) should go.

The entrypoint for this image is simply running “pip wheel” with the appropriate virtualenv activated. It runs against whatever is in the /application volume, so we could potentially build wheels for multiple different applications. In this example, I’m using pip wheel . which builds the current directory, but you may have a requirements.txt which pins all your dependencies, in which case you might want to use pip wheel -r requirements.txt instead.

At this point, we need to build the builder image, which can be accomplished with:

1
$ docker build -t deployme-builder -f build.docker .;

This builds a deployme-builder that we can use to build the wheels for the application. Since this is a prerequisite step for building the application container itself, you can go ahead and do that now. In order to do so, we must tell the builder to use the current directory as the application being built (the volume at /application) and to put the wheels into a wheelhouse directory (one called wheelhouse will do):

1
2
3
4
5
$ mkdir -p wheelhouse;
$ docker run --rm \
         -v "$(pwd)":/application \
         -v "$(pwd)"/wheelhouse:/wheelhouse \
         deployme-builder;

After running this, if you look in the wheelhouse directory, you should see a bunch of wheels built there, including one for the application being built:

1
2
3
4
5
6
$ ls wheelhouse
DeployMe-0.1-py2-none-any.whl
Twisted-15.0.0-pp27-none-linux_x86_64.whl
Werkzeug-0.10.1-py2-none-any.whl
cffi-0.9.0-py2-none-any.whl
# ...

At last, time to build the application container itself. The setup for that is very short, since most of the work has already been done for us in the production of the wheels:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# run.docker
FROM deployme-base

ADD wheelhouse /wheelhouse
RUN . /appenv/bin/activate; \
    pip install --no-index -f wheelhouse DeployMe

EXPOSE 8081

ENTRYPOINT . /appenv/bin/activate; \
           run-the-app

During build, this dockerfile pulls from our shared base image, then adds the wheelhouse we just produced as a directory at /wheelhouse. The only shell command that needs to run in order to get the wheels installed is pip install TheApplicationYouJustBuilt, with two options: --no-index to tell pip “don’t bother downloading anything from PyPI, everything you need should be right here”, and, -f wheelhouse which tells it where “here” is.

The entrypoint for this one activates the virtualenv and invokes run-the-app, the setuptools entrypoint defined above in setup.py, which should be on the $PATH once that virtualenv is activated.

The application build is very simple, just

1
$ docker build -t deployme-run -f run.docker .;

to build the docker file.

Similarly, running the application is just like any other docker container:

1
$ docker run --rm -it -p 8081:8081 deployme-run

You can then hit port 8081 on your docker host to load the application.

The command-line for docker run here is just an example; for example, I’m passing --rm so that if you run this example just so that it won’t clutter up your container list. Your environment will have its own way to call docker run, how to get your VOLUMEs and EXPOSEd ports mapped, and discussing how to orchestrate your containers is out of scope for this post; you can pretty much run it however you like. Everything the image needs is built in at this point.

To review:

  1. have a common base container that contains all your non-Python (C libraries and utilities) dependencies. Avoid installing development tools here.
  2. use a virtualenv even though you’re in a container to avoid any surprises from the host Python environment
  3. have a “build” container that just makes the virtualenv and puts wheel and pip into it, and runs pip wheel
  4. run the build container with your application code in a volume as input and a wheelhouse volume as output
  5. create an application container by starting from the same base image and, once again not installing any dev tools, pip install all the wheels that you just built, turning off access to PyPI for that installation so it goes quickly and deterministically based on the wheels you’ve built.

While this sample application uses Twisted, it’s quite possible to apply this same process to just about any Python application you want to run inside Docker.

I’ve put a sample project up on Github which contain all the files referenced here, as well as “build” and “run” shell scripts that combine the necessary docker commandlines to go through the full process to build and run this sample app. While it defaults to the PyPy runtime (as most networked Python apps generally should these days, since performance is so much better than CPython), if you have an application with a hard CPython dependency, I’ve also made a branch and pull request on that project for CPython, and you can look at the relatively minor patch required to get it working for CPython as well.

Now that you have a container with an application in it that you might want to deploy, my previous write-up on a quick way to securely push stuff to a production service might be of interest.

(Once again, thanks to my employer, Rackspace, for sponsoring the time for me to write this post. Thanks also to Shawn Ashlee and Jesse Spears for helping me refine these ideas and listening to me rant about them. However, that expression of gratitude should not be taken as any kind of endorsement from any of these parties as to my technical suggestions or opinions here, as they are entirely my own.)