The “Dependency Cutout” Workflow Pattern, Part I

It’s important to be able to fix bugs in your open source dependencies, and not just work around them.

Tell me if you’ve heard this one before.

You’re working on an application. Let’s call it “FooApp”. FooApp has a dependency on an open source library, let’s call it “LibBar”. You find a bug in LibBar that affects FooApp.

To envisage the best possible version of this scenario, let’s say you actively like LibBar, both technically and socially. You’ve contributed to it in the past. But this bug is causing production issues in FooApp today, and LibBar’s release schedule is quarterly. FooApp is your job; LibBar is (at best) your hobby. Blocking on the full upstream contribution cycle and waiting for a release is an absolute non-starter.

What do you do?

There are a few common reactions to this type of scenario, all of which are bad options.

I will enumerate them specifically here, because I suspect that some of them may resonate with many readers:

  1. Find an alternative to LibBar, and switch to it.

    This is a bad idea because a transition to a core infrastructure component could be extremely expensive.

  2. Vendor LibBar into your codebase and fix your vendored version.

    This is a bad idea because carrying this one fix now requires you to maintain all the tooling associated with a monorepo1: you have to be able to start pulling in new versions from LibBar regularly, reconcile your changes even though you now have a separate version history on your imported version, and so on.

  3. Monkey-patch LibBar to include your fix.

    This is a bad idea because you are now extremely tightly coupled to a specific version of LibBar. By modifying LibBar internally like this, you’re inherently violating its compatibility contract, in a way which is going to be extremely difficult to test. You can test this change, of course, but as LibBar changes, you will need to replicate any relevant portions of its test suite (which may be its entire test suite) in FooApp. Lots of potential duplication of effort there.

  4. Implement a workaround in your own code, rather than fixing it.

    This is a bad idea because you are distorting the responsibility for correct behavior. LibBar is supposed to do LibBar’s job, and unless you have a full wrapper for it in your own codebase, other engineers (including “yourself, personally”) might later forget to go through the alternate, workaround codepath, and invoke the buggy LibBar behavior again in some new place.

  5. Implement the fix upstream in LibBar anyway, because that’s the Right Thing To Do, and burn credibility with management while you anxiously wait for a release with the bug in production.

    This is a bad idea because you are betraying your users — by allowing the buggy behavior to persist — for the workflow convenience of your dependency providers. Your users are probably giving you money, and trusting you with their data. This means you have both ethical and economic obligations to consider their interests.

    As much as it’s nice to participate in the open source community and take on an appropriate level of burden to maintain the commons, this cannot sustainably be at the explicit expense of the population you serve directly.

    Even if we only care about the open source maintainers here, there’s still a problem: as you are likely to come under immediate pressure to ship your changes, you will inevitably relay at least a bit of that stress to the maintainers. Even if you try to be exceedingly polite, the maintainers will know that you are coming under fire for not having shipped the fix yet, and are likely to feel an even greater burden of obligation to ship your code fast.

    Much as it’s good to contribute the fix, it’s not great to put this on the maintainers.

The respective incentive structures of software development — specifically, of corporate application development and open source infrastructure development — make options 1-4 very common.

On the corporate / application side, these issues are:

  • it’s difficult for corporate developers to get clearance to spend even small amounts of their work hours on upstream open source projects, but clearance to spend time on the project they actually work on is implicit. If it takes 3 hours of wrangling with Legal2 and 3 hours of implementation work to fix the issue in LibBar, but 0 hours of wrangling with Legal and 40 hours of implementation work in FooApp, a FooApp developer will often perceive it as “easier” to fix the issue downstream.

  • it’s difficult for corporate developers to get clearance from management to spend even small amounts of money sponsoring upstream reviewers, so even if they can find the time to contribute the fix, chances are high that it will remain stuck in review unless they are personally well-integrated members of the LibBar development team already.

  • even assuming there’s zero pressure whatsoever to avoid open sourcing the upstream changes, there’s still the fact inherent to any development team that FooApp’s developers will be more familiar with FooApp’s codebase and development processes than they are with LibBar’s. It’s just easier to work there, even if all other things are equal.

  • systems for tracking risk from open source dependencies often lack visibility into vendoring, particularly if you’re doing a hybrid approach and only vendoring a few things to address work in progress, rather than a comprehensive and disciplined approach to a monorepo. If you fully absorb a vendored dependency and then modify it, Dependabot isn’t going to tell you that a new version is available any more, because it won’t be present in your dependency list. Organizationally this is bad of course but from the perspective of an individual developer this manifests mostly as fewer annoying emails.

But there are problems on the open source side as well. Those problems are all derived from one big issue: because we’re often working with relatively small sums of money, it’s hard for upstream open source developers to consume either money or patches from application developers. It’s nice to say that you should contribute money to your dependencies, and you absolutely should, but the cost-benefit function is discontinuous. Before a project reaches the fiscal threshold where it can be at least one person’s full-time job to worry about this stuff, there’s often no-one responsible in the first place. Developers will therefore gravitate to the issues that are either fun, or relevant to their own job.

These mutually-reinforcing incentive structures are a big reason that users of open source infrastructure, even teams who work at corporate users with zillions of dollars, don’t reliably contribute back.

The Answer We Want

All those options are bad. If we had a good option, what would it look like?

It is both practically necessary3 and morally required4 for you to have a way to temporarily rely on a modified version of an open source dependency, without permanently diverging.

Below, I will describe a desirable abstract workflow for achieving this goal.

Step 0: Report the Problem

Before you get started with any of these other steps, write up a clear description of the problem and report it to the project as an issue; specifically, in contrast to writing it up as a pull request. Describe the problem before submitting a solution.

You may not be able to wait for a volunteer-run open source project to respond to your request, but you should at least tell the project what you’re planning on doing.

If you don’t hear back from them at all, you will have at least made sure to comprehensively describe your issue and strategy beforehand, which will provide some clarity and focus to your changes.

If you do hear back from them, in the worst case scenario, you may discover that a hard fork will be necessary because they don’t consider your issue valid, but even that information will save you time, if you know it before you get started. In the best case, you may get a reply from the project telling you that you’ve misunderstood its functionality and that there is already a configuration parameter or usage pattern that will resolve your problems with no new code. But in all cases, you will benefit from early coordination on what needs fixing before you get to how to fix it.

Step 1: Source Code and CI Setup

Fork the source code for your upstream dependency to a writable location where it can live at least for the duration of this one bug-fix, and possibly for the duration of your application’s use of the dependency. After all, you might want to fix more than one bug in LibBar.

You want to have a place where you can put your edits, that will be version controlled and code reviewed according to your normal development process. This probably means you’ll need to have your own main branch that diverges from your upstream’s main branch.

Remember: you’re going to need to deploy this to your production, so testing gates that your upstream only applies to final releases of LibBar will need to be applied to every commit here.

Depending on your LibBar’s own development process, this may result in slightly unusual configurations where, for example, your fixes are written against the last LibBar release tag, rather than its current5 main; if the project has a branch-freshness requirement, you might need two branches, one for your upstream PR (based on main) and one for your own use (based on the release branch with your changes).

Ideally for projects with really good CI and a strong “keep main release-ready at all times” policy, you can deploy straight from a development branch, but it’s good to take a moment to consider this before you get started. It’s usually easier to rebase changes from an older HEAD onto a newer one than it is to go backwards.

Speaking of CI, you will want to have your own CI system. The fact that GitHub Actions has become a de-facto lingua franca of continuous integration means that this step may be quite simple, and your forked repo can just run its own instance.

Optional Bonus Step 1a: Artifact Management

If you have an in-house artifact repository, you should set that up for your dependency too, and upload your own build artifacts to it. You can often treat your modified dependency as an extension of your own source tree and install from a GitHub URL, but if you’ve already gone to the trouble of having an in-house package repository, you can pretend you’ve taken over maintenance of the upstream package temporarily (which you kind of have) and leverage those workflows for caching and build-time savings as you would with any other internal repo.

Step 2: Do The Fix

Now that you’ve got somewhere to edit LibBar’s code, you will want to actually fix the bug.

Step 2a: Local Filesystem Setup

Before you have a production version on your own deployed branch, you’ll want to test locally, which means having both repositories in a single integrated development environment.

At this point, you will want to have a local filesystem reference to your LibBar dependency, so that you can make real-time edits, without going through a slow cycle of pushing to a branch in your LibBar fork, pushing to a FooApp branch, and waiting for all of CI to run on both.

This is useful in both directions: as you prepare the FooApp branch that makes any necessary updates on that end, you’ll want to make sure that FooApp can exercise the LibBar fix in any integration tests. As you work on the LibBar fix itself, you’ll also want to be able to use FooApp to exercise the code and see if you’ve missed anything - and this, you wouldn’t get in CI, since LibBar can’t depend on FooApp itself.

In short, you want to be able to treat both projects as an integrated development environment, with support from your usual testing and debugging tools, just as much as you want your deployment output to be an integrated artifact.

Step 2b: Branch Setup for PR

However, for continuous integration to work, you will also need to have a remote resource reference of some kind from FooApp’s branch to LibBar. You will need 2 pull requests: the first to land your LibBar changes to your internal LibBar fork and make sure it’s passing its own tests, and then a second PR to switch your LibBar dependency from the public repository to your internal fork.

At this step it is very important to ensure that there is an issue filed on your own internal backlog to drop your LibBar fork. You do not want to lose track of this work; it is technical debt that must be addressed.

Until it’s addressed, automated tools like Dependabot will not be able to apply security updates to LibBar for you; you’re going to need to manually integrate every upstream change. This type of work is itself very easy to drop or lose track of, so you might just end up stuck on a vulnerable version.

Step 3: Deploy Internally

Now that you’re confident that the fix will work, and that your temporarily-internally-maintained version of LibBar isn’t going to break anything on your site, it’s time to deploy.

Some deployment heritage should help to provide some evidence that your fix is ready to land in LibBar, but at the next step, please remember that your production environment isn’t necessarily emblematic of that of all LibBar users.

Step 4: Propose Externally

You’ve got the fix, you’ve tested the fix, you’ve got the fix in your own production, you’ve told upstream you want to send them some changes. Now, it’s time to make the pull request.

You’re likely going to get some feedback on the PR, even if you think it’s already ready to go; as I said, despite having been proven in your production environment, you may get feedback about additional concerns from other users that you’ll need to address before LibBar’s maintainers can land it.

As you process the feedback, make sure that each new iteration of your branch gets re-deployed to your own production. It would be a huge bummer to go through all this trouble, and then end up unable to deploy the next publicly released version of LibBar within FooApp because you forgot to test that your responses to feedback still worked on your own environment.

Step 4a: Hurry Up And Wait

If you’re lucky, upstream will land your changes to LibBar. But, there’s still no release version available. Here, you’ll have to stay in a holding pattern until upstream can finalize the release on their end.

Depending on some particulars, it might make sense at this point to archive your internal LibBar repository and move your pinned release version to a git hash of the LibBar version where your fix landed, in their repository.

Before you do this, check in with the LibBar core team and make sure that they understand that’s what you’re doing and they don’t have any wacky workflows which may involve rebasing or eliding that commit as part of their release process.

Step 5: Unwind Everything

Finally, you eventually want to stop carrying any patches and move back to an official released version that integrates your fix.

You want to do this because this is what the upstream will expect when you are reporting bugs. Part of the benefit of using open source is benefiting from the collective work to do bug-fixes and such, so you don’t want to be stuck off on a pinned git hash that the developers do not support for anyone else.

As I said in step 2b6, make sure to maintain a tracking task for doing this work, because leaving this sort of relatively easy-to-clean-up technical debt lying around is something that can potentially create a lot of aggravation for no particular benefit. Make sure to put your internal LibBar repository into an appropriate state at this point as well.

Up Next

This is part 1 of a 2-part series. In part 2, I will explore in depth how to execute this workflow specifically for Python packages, using some popular tools. I’ll discuss my own workflow, standards like PEP 517 and pyproject.toml, and of course, by the popular demand that I just know will come, uv.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor!


  1. if you already have all the tooling associated with a monorepo, including the ability to manage divergence and reintegrate patches with upstream, you already have the higher-overhead version of the workflow I am going to propose, so, never mind. but chances are you don’t have that, very few companies do. 

  2. In any business where one must wrangle with Legal, 3 hours is a wildly optimistic estimate. 

  3. c.f. @mcc@mastodon.social 

  4. c.f. @geofft@mastodon.social 

  5. In an ideal world every project would keep its main branch ready to release at all times, no matter what but we do not live in an ideal world. 

  6. In this case, there is no question. It’s 2b only, no not-2b. 

Software Needs To Be More Expensive

Software, like coffee, is too artificially cheap, and we need to make it more expensive. I have one suggestion for how to do that.

The Cost of Coffee

One of the ideas that James Hoffmann — probably the most influential… influencer in the coffee industry — works hard to popularize is that “coffee needs to be more expensive”.

The coffee industry is famously exploitative. Despite relatively thin margins for independent café owners1, there are no shortage of horrific stories about labor exploitation and even slavery in the coffee supply chain.

To summarize a point that Mr. Hoffman has made over a quite long series of videos and interviews2, some of this can be fixed by regulatory efforts. Enforcement of supply chain policies both by manufacturers and governments can help spot and avoid this type of exploitation. Some of it can be fixed by discernment on the part of consumers. You can try to buy fair-trade coffee, avoid brands that you know have problematic supply-chain histories.

Ultimately, though, even if there is perfect, universal, zero-cost enforcement of supply chain integrity… consumers still have to be willing to, you know, pay more for the coffee. It costs more to pay wages than to have slaves.

The Price of Software

The problem with the coffee supply chain deserves your attention in its own right. I don’t mean to claim that the problems of open source maintainers are as severe as those of literal child slaves. But the principle is the same.

Every tech company uses huge amounts of open source software, which they get for free.

I do not want to argue that this is straightforwardly exploitation. There is a complex bargain here for the open source maintainers: if you create open source software, you can get a job more easily. If you create open source infrastructure, you can make choices about the architecture of your projects which are more long-term sustainable from a technology perspective, but would be harder to justify on a shorter-term commercial development schedule. You can collaborate with a wider group across the industry. You can build your personal brand.

But, in light of the recent xz Utils / SSH backdoor scandal, it is clear that while the bargain may not be entirely one-sided, it is not symmetrical, and significant bad consequences may result, both for the maintainers themselves and for society.

To fix this problem, open source software needs to get more expensive.

A big part of the appeal of open source is its implicit permission structure, which derives both from its zero up-front cost and its zero marginal cost.

The zero up-front cost means that you can just get it to try it out. In many companies, individual software developers do not have the authority to write a purchase order, or even a corporate credit card for small expenses.

If you are a software engineer and you need a new development tool or a new library that you want to purchase for work, it can be a maze of bureaucratic confusion in order to get that approved. It might be possible, but you are likely to get strange looks, and someone, probably your manager, is quite likely to say “isn’t there a free option for this?” At worst, it might just be impossible.

This makes sense. Dealing with purchase orders and reimbursement requests is annoying, and it only feels worth the overhead if you’re dealing with a large enough block of functionality that it is worth it for an entire team, or better yet an org, to adopt. This means that most of the purchasing is done by management types or “architects”, who are empowered to make decisions for larger groups.

When individual engineers need to solve a problem, they look at open source libraries and tools specifically because it’s quick and easy to incorporate them in a pull request, where a proprietary solution might be tedious and expensive.

That’s assuming that a proprietary solution to your problem even exists. In the infrastructure sector of the software economy, free options from your operating system provider (Apple, Microsoft, maybe Amazon if you’re in the cloud) and open source developers, small commercial options have been marginalized or outright destroyed by zero-cost options, for this reason.

If the zero up-front cost is a paperwork-reduction benefit, then the zero marginal cost is almost a requirement. One of the perennial complaints of open source maintainers is that companies take our stuff, build it into a product, and then make a zillion dollars and give us nothing. It seems fair that they’d give us some kind of royalty, right? Some tiny fraction of that windfall? But once you realize that individual developers don’t have the authority to put $50 on a corporate card to buy a tool, they super don’t have the authority to make a technical decision that encumbers the intellectual property of their entire core product to give some fraction of the company’s revenue away to a third party. Structurally, there’s no way that this will ever happen.

Despite these impediments, keeping those dependencies maintained does cost money.

Some Solutions Already Exist

There are various official channels developing to help support the maintenance of critical infrastructure. If you work at a big company, you should probably have a corporate Tidelift subscription. Maybe ask your employer about that.

But, as they will readily admit there are a LOT of projects that even Tidelift cannot cover, with no official commercial support, and no practical way to offer it in the short term. Individual maintainers, like yours truly, trying to figure out how to maintain their projects, either by making a living from them or incorporating them into our jobs somehow. People with a Ko-Fi or a Patreon, or maybe just an Amazon wish-list to let you say “thanks” for occasional maintenance work.

Most importantly, there’s no path for them to transition to actually making a living from their maintenance work. For most maintainers, Tidelift pays a sub-hobbyist amount of money, and even setting it up (and GitHub Sponsors, etc) is a huge hassle. So even making the transition from “no income” to “a little bit of side-hustle income” may be prohibitively bureaucratic.

Let’s take myself as an example. If you’re a developer who is nominally profiting from my infrastructure work in your own career, there is a very strong chance that you are also a contributor to the open source commons, and perhaps you’ve even contributed more to that commons than I have, contributed more to my own career success than I have to yours. I can ask you to pay me3, but really you shouldn’t be paying me, your employer should.

What To Do Now: Make It Easy To Just Pay Money

So if we just need to give open source maintainers more money, and it’s really the employers who ought to be giving it, then what can we do?

Let’s not make it complicated. Employers should just give maintainers money. Let’s call it the “JGMM” benefit.

Specifically, every employer of software engineers should immediately institute the following benefits program: each software engineer should have a monthly discretionary budget of $50 to distribute to whatever open source dependency developers they want, in whatever way they see fit. Venmo, Patreon, PayPal, Kickstarter, GitHub Sponsors, whatever, it doesn’t matter. Put it on a corp card, put the project name on the line item, and forget about it. It’s only for open source maintenance, but it’s a small enough amount that you don’t need intense levels of approval-gating process. You can do it on the honor system.

This preserves zero up-front cost. To start using a dependency, you still just use it4. It also preserves zero marginal cost: your developers choose which dependencies to support based on perceived need and popularity. It’s a fixed overhead which doesn’t scale with revenue or with profit, just headcount.

Because the whole point here is to match the easy, implicit, no-process, no-controls way in which dependencies can be added in most companies. It should be easier to pay these small tips than it is to use the software in the first place.

This sub-1% overhead to your staffing costs will massively de-risk the open source projects you use. By leaving the discretion up to your engineers, you will end up supporting those projects which are really struggling and which your executives won’t even hear about until they end up on the news. Some of it will go to projects that you don’t use, things that your engineers find fascinating and want to use one day but don’t yet depend upon, but that’s fine too. Consider it an extremely cheap, efficient R&D expense.

A lot of the options for developers to support open source infrastructure are already tax-deductible, so if they contribute to something like one of the PSF’s fiscal sponsorees, it’s probably even more tax-advantaged than a regular business expense.

I also strongly suspect that if you’re one of the first employers to do this, you can get a round of really positive PR out of the tech press, and attract engineers, so, the race is on. I don’t really count as the “tech press” but nevertheless drop me a line to let me know if your company ends up doing this so I can shout you out.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics such as “How do I figure out which open source projects to give money to?”.


  1. I don’t have time to get into the margins for Starbucks and friends, their relationship with labor, economies of scale, etc. 

  2. While this is a theme that pervades much of his work, the only place I can easily find where he says it in so many words is on a podcast that sometimes also promotes right-wing weirdos and pseudo-scientific quacks spreading misinformation about autism and ADHD. So, I obviously don’t want to link to them; you’ll have to take my word for it. 

  3. and I will, since as I just recently wrote about, I need to make sure that people are at least aware of the option 

  4. Pending whatever legal approval program you have in place to vet the license. You do have a nice streamlined legal approvals process, right? You’re not just putting WTFPL software into production, are you? 

The Hat

If you want me to keep doing… whatever this is… now’s the time to support it!

This year I will be going to two conferences: PyCon 2024, and North Bay Python 2024.

At PyCon, I will be promoting my open source work and my writing on this blog. As I’m not giving a talk this year, I am looking forward to organizing and participating in some open spaces about topics like software architecture, open source sustainability, framework interoperability, and event-driven programming.

At North Bay Python, though, I will either be:

  1. doing a lot more of that, or
  2. looking for new full-time employment, pausing the patreon, and winding down this experiment.

If you’d like to see me doing the former, now would be a great time to sign up to my Patreon to support the continuation of my independent open source work and writing.

The Bad News

It has been nearly a year since I first mentioned that I have a Patreon on this blog. That year has been a busy one, with consulting work and personal stuff consuming more than enough time that I have never been full time on independent coding & blogging. As such, I’ve focused more on small infrastructure projects and less on user-facing apps than I’d like, but I have spent the plurality of my time on it.

For that to continue, let alone increase, this work needs to—at the very least—pay for health insurance. At my current consulting rate, a conservative estimate based on some time tracking is that I am currently doing this work at something like a 98.5% discount. I do love doing it! I would be happy to continue doing it at a significant discount! But “significant” and “catastrophic” are different thresholds.

As I have said previously, this is an appeal to support my independent work; not to support me. I will be fine; what you will be voting for with your wallet is not my well-being but a choice about where I spend my time.

Hiding The Hat

When people ask me what I do these days, I sometimes struggle to explain. It is confusing. I might say I have a Patreon, I do a combination of independent work and consulting, or if I’m feeling particularly sarcastic I might say I’m an ✨influencer✨. But recently I saw the very inspiring Matt Ricardo describing the way he thinks about his own Patreon, and I realized what I am actually trying to do, which is software development busking.

Previously, when I’ve been thinking about making this “okay, it’s been a year of Patreon, let’s get serious now” post, I’ve been thinking about adding more reward products to my Patreon, trying to give people better value for their money before asking for anything more, trying to finish more projects to make a better sales pitch, maybe making merch available for sale, and so on. So aside from irregular weekly posts on Friday and acknowledgments sections at the bottom of blog posts, I’ve avoided mentioning this while I think about adding more private rewards.

But busking is a public performance, and if you want to support my work then it is that public aspect that you presumably want to support. And thus, an important part of the busking process is to actually pass the hat at the end. The people who don’t chip in still get to see the performance, but everyone else need to know that they can contribute if they liked it.1

I’m going to try to stop hiding the metaphorical hat. I still don’t want to overdo it, but I will trust that you’ll tell me if these reminders get annoying. For my part today, in addition to this post, I’m opening up a new $10 tier on Patreon for people who want to provide a higher level of support, and officially acknowledging the rewards that I already provide.

What’s The Deal?

So, what would you be supporting?

What You Give (The Public Part)

  1. I have tended to focus on my software, and there has been a lot of it! You’d be supporting me writing libraries and applications and build infrastructure to help others do the same with Python, as well as maintaining existing libraries (like the Twisted ecosystem libraries) sometimes. If I can get enough support together to more than bare support for myself, I’d really like to be able to do things like pay people to others to help with aspects of applications that I would struggle to complete myself, like accessibility or security audits.
  2. I also do quite a bit of writing though, about software and about other things. To make it easier to see what sort of writing I’m talking about, I’ve collected just the stuff that I’ve written during the period where I have had some patrons, under the supported tag.
  3. Per my earlier sarcastic comment about being an “influencer”, I also do quite a bit of posting on Mastodon about software and the tech industry.

What You Get (Just For Patrons)

As I said above, I will be keeping member benefits somewhat minimal.

  1. I will add you to SponCom so that your name will be embedded in commit messages like this one on a cadence appropriate to your support level.
  2. You will get access to my private Patreon posts where I discuss what I’ve been working on. As one of my existing patrons put it:

    I figure I’m getting pretty good return on it, getting not only targeted project tracking, but also a peek inside your head when it comes to Sores Business Development. Maybe some of that stuff will rub off on me :)

  3. This is a somewhat vague and noncommittal benefit, but it might be the best one: if you are interested in the various different projects that I am doing, you can help me prioritize! I have a lot of things going on. What would you prefer that I focus on? You can make suggestions in the comments of Patreon posts, which I pay a lot more attention to than other feedback channels.
  4. In the near future2 I am also planning to start doing some “office hours” type live-streaming, where I will take audience questions and discuss software design topics, or maybe do some live development to showcase my process and the tools I use. When I figure out the mechanics of this, I plan to add some rewards to the existing tiers to select topics or problems for me to work on there.

If that sounds like a good deal to you, please sign up now. If you’re already supporting me, sharing this and giving a brief testimonial of something good I’ve done would be really helpful. Github is not an algorithmic platform like YouTube, despite my occasional jokey “remember to like and subscribe”, nobody is getting recommended DBXS, or Fritter, or Twisted, or Automat, or this blog unless someone goes out and shares it.


  1. A year into this, after what feels like endlessly repeating this sales pitch to the point of obnoxiousness, I still routinely interact with people who do not realize that I have a Patreon at all. 

  2. Not quite comfortable putting this on the official patreon itemized inventory of rewards yet, but I do plan to add it once I’ve managed to stream for a couple of weeks in a row. 

Telemetry Is Not Your Enemy

Not all data collection is the same, and not all of it is bad.

Part 1: A Tale of Two Metaphors

In software development “telemetry” is data collected from users of the software, almost always delivered to the authors of the software via the Internet.

In recent years, there has been a great deal of angry public discourse about telemetry. In particular, there is a lot of concern that every software vendor and network service operator collecting any data at all is spying on its users, surveilling every aspect of our lives. The media narrative has been that any tech company collecting data for any purpose is acting creepy as hell.

I am quite sympathetic to this view. In general, some concern about privacy is warranted whenever some new data-collection scheme is proposed. However it seems to me that the default response is no longer “concern and skepticism”; but rather “panic and fury”. All telemetry is seen as snooping and all snooping is seen as evil.

There’s a sense in which software telemetry is like surveillance. However, it is only like surveillance. Surveillance is a metaphor, not a description. It is far from a perfect metaphor.

In the discourse around user privacy, I feel like we have lost a lot of nuance about the specific details of telemetry when some people dismiss all telemetry as snooping, spying, or surveillance.

Here are some ways in which software telemetry is not like “snooping”:

  1. The data may be aggregated. The people consuming the results of telemetry are rarely looking at individual records, and individual records may not even exist in some cases. There are tools, like Prio, to do this aggregation to be as privacy-sensitive as possible.
  2. The data is rarely looked at by human beings. In the cases (such as ad-targeting) where the data is highly individuated, both the input (your activity) and the output (your recommendations) are both mainly consumed by you, in your experience of a product, by way of algorithms acting upon the data, not by an employee of the company you’re interacting with.1
  3. The data is highly specific. “Here’s a record with your account ID and the number of times you clicked the Add To Cart button without checking out” is not remotely the same class of information as “Here’s several hours of video and audio, attached to your full name, recorded without your knowledge or consent”. Emotional appeals calling any data “surveillance” tend to suggest that all collected data is the latter, where in reality most of it is much closer to the former.

There are other metaphors which can be used to understand software telemetry. For example, there is also a sense in which it is like voting.

I emphasize that voting is also a metaphor here, not a description. I will also freely admit that it is in many ways a worse metaphor for telemetry than “surveillance”. But it can illuminate other aspects of telemetry, the ones that the surveillance metaphor leaves out.

Data-collection is like voting because the data can represent your interests to a party that has some power over you. Your software vendor has the power to change your software, and you probably don’t, either because you don’t have access to the source code. Even if it’s open source, you almost certainly don’t have the resources to take over its maintenance.

For example, let’s consider this paragraph from some Microsoft documentation about telemetry:

We also use the insights to drive improvements and intelligence into some of our management and monitoring solutions. This improvement helps customers diagnose quality issues and save money by making fewer support calls to Microsoft.

“Examples of how Microsoft uses the telemetry data” from the Azure SDK documentation

What Microsoft is saying here is that they’re collecting the data for your own benefit. They’re not attempting to justify it on the basis that defenders of law-enforcement wiretap schemes might. Those who want literal mass surveillance tend to justify it by conceding that it might hurt individuals a little bit to be spied upon, but if we spy on everyone surely we can find the bad people and stop them from doing bad things. That’s best for society.

But Microsoft isn’t saying that.2 What Microsoft is saying here is that if you’re experiencing a problem, they want to know about it so they can fix it and make the experience better for you.

I think that is at least partially true.

Part 2: I Qualify My Claims Extensively So You Jackals Don’t Lose Your Damn Minds On The Orange Website

I was inspired to write this post due to the recent discussions in the Go community about how to collect telemetry which provoked a lot of vitriol from people viscerally reacting to any telemetry as invasive surveillance. I will therefore heavily qualify what I’ve said above to try to address some of that emotional reaction in advance.

I am not suggesting that we must take Microsoft (or indeed, the Golang team) fully at their word here. Trillion dollar corporations will always deserve skepticism. I will concede in advance that it’s possible the data is put to other uses as well, possibly to maximize profits at the expense of users. But it seems reasonable to assume that this is at least partially true; it’s not like Microsoft wants Azure to be bad.

I can speak from personal experience. I’ve been in professional conversations around telemetry. When I have, my and my teams’ motivations were overwhelmingly focused on straightforwardly making the user experience good. We wanted it to be good so that they would like our products and buy more of them.

It’s hard enough to do that without nefarious ulterior motives. Most of the people who develop your software just don’t have the resources it takes to be evil about this.

Part 3: They Can’t Help You If They Can’t See You

With those qualifications out of the way, I will proceed with these axioms:

  1. The developers of software will make changes to it.
  2. These changes will benefit some users.
  3. Which changes the developers select will be derived, at least in part, from the information that they have.
  4. At least part of the information that the developers have is derived from the telemetry they collect.

If we can agree that those axioms are reasonable, then let us imagine two user populations:

  • Population A is privacy-sensitive and therefore sees telemetry as bad, and opts out of everything they possibly can.
  • Population B doesn’t care about privacy, and therefore ignores any telemetry and blithely clicks through any opt-in.

When the developer goes to make changes, they will have more information about Population B. Even if they’re vaguely aware that some users are opting out (or refusing to opt in), the developer will know far less about Population A. This means that any changes the developer makes will not serve the needs of their privacy-conscious users, which means fewer features that respect privacy as time goes on.

Part 4: Free as in Fact-Free Guesses

In the world of open source software, this problem is even worse. We often have fewer resources with which to collect and analyze telemetry in the first place, and when we do attempt to collect it, a vocal minority among those users are openly hostile, with feedback that borders on harassment. So we often have no telemetry at all, and are making changes based on guesses.

Meanwhile, in proprietary software, the user population is far larger and less engaged. Developers are not exposed directly to users and therefore cannot be harassed or intimidated into dropping their telemetry. Which means that proprietary software gains a huge advantage: they can know what most of their users want, make changes to accommodate it, and can therefore make a product better than the one based on uninformed guesses from the open source competition.

Proprietary software generally starts out with a panoply of advantages already — most of which boil down to “money” — but our collective knee-jerk reaction to any attempt to collect telemetry is a massive and continuing own-goal on the part of the FLOSS community. There’s no inherent reason why free software’s design cannot be based on good data, but our community’s history and self-selection biases make us less willing to consider it.

That does not mean we need to accept invasive data collection that is more like surveillance. We do not need to allow for stockpiled personally-identifiable information about individual users that lives forever. The abuses of indiscriminate tech data collection are real, and I am not suggesting that we forget about them.

The process for collecting telemetry must be open and transparent, the data collected needs to be continuously vetted for safety. Clear data-retention policies should always be in place to avoid future unanticipated misuses of data that is thought to be safe today but may be de-anonymized or otherwise abused in the future.

I want the collaborative feedback process of open source development to result in this kind of telemetry: thoughtful, respectful of user privacy, and designed with the principle of least privilege in mind. If we have this kind of process, then we could hold it up as an example for proprietary developers to follow, and possibly improve the industry at large.

But in order to be able to produce that example, we must produce criticism of telemetry efforts that is specific, grounded in actual risks and harms to users, rather than a series of emotional appeals to slippery-slope arguments that do not correspond to the actual data being collected. We must arrive at a consensus that there are benefits to users in allowing software engineers to have enough information to do their jobs, and telemetry is not uniformly bad. We cannot allow a few users who are complaining to stop these efforts for everyone.

After all, when those proprietary developers look at the hard data that they have about what their users want and need, it’s clear that those who are complaining don’t even exist.


  1. Please note that I’m not saying that this automatically makes such collection ethical. Attempting to modify user behavior or conduct un-reviewed psychological experiments on your customers is also wrong. But it’s wrong in a way that is somewhat different than simply spying on them. 

  2. I am not suggesting that data collected for the purposes of improving the users’ experience could not be used against their interest, whether by law enforcement or by cybercriminals or by Microsoft itself. Only that that’s not what the goal is here. 

What Would You Say You Do Here?

A brief description of the various projects that I am hoping to do independently, with your support. In other words, this is an ad, for me.

What have I been up to?

Late last year, I launched a Patreon. Although not quite a “soft” launch — I did toot about it, after all — I didn’t promote it very much.

I started this way because I realized that if I didn’t just put something up I’d be dithering forever. I’d previously been writing a sprawling monster of an announcement post that went into way too much detail, and kept expanding to encompass more and more ideas until I came to understand that salvaging it was going to be an editing process just as brutal and interminable as the writing itself.

However, that post also included a section where I just wrote about what I was actually doing.

So, for lots of reasons1, there are a diverse array of loosely related (or unrelated) projects below which may not get finished any time soon. Or, indeed, may go unfinished entirely. Some are “done enough” now, and just won’t receive much in the way of future polish.

That is an intentional choice.

The rationale, as briefly as I can manage, is: I want to lean into the my strength2 of creative, divergent thinking, and see how these ideas pan out without committing to them particularly intensely. My habitual impulse, for many years, has been to lean extremely hard on strategies that compensate for my weaknesses in organization, planning, and continued focus, and attempt to commit to finishing every project to prove that I’ll never flake on anything.

While the reward tiers for the Patreon remain deliberately ambiguous3, I think it would be fair to say that patrons will have some level of influence in directing my focus by providing feedback on these projects, and requesting that I work more on some and less on others.

So, with no further ado: what have I been working on, and what work would you be supporting if you signed up? For each project, I’ll be answering 3 questions:

  1. What is it?
  2. What have I been doing with it recently?
  3. What are my plans for it?

This. i.e. blog.glyph.im

What is it?

For starters, I write stuff here. I guess you’re reading this post for some reason, so you might like the stuff I write? I feel like this doesn’t require much explanation.

What have I done with it recently?

You might appreciate the explicitly patron-requested Potato Programming post, a screed about dataclass, or a deep dive on the difficulties of codesigning and notarization on macOS along with an announcement of a tool to remediate them.

What are my plans for it?

You can probably expect more of the same; just all the latest thoughts & ideas from Glyph.

Twisted

What is it?

If you know of me you probably know of me as “the Twisted guy” and yeah, I am still that. If, somehow, you’ve ended up here and you don’t know what it is, wow, that’s cool, thanks for coming, super interested to know what you do know me for.

Twisted is an event-driven networking engine written in Python, the precursor and inspiration for the asyncio module, and a suite of event-driven programming abstractions, network protocol implementations, and general utility code.

What have I done with it recently?

I’ve gotten a few things merged, including type annotations for getPrimes and making the bundled CLI OpenSSH server replacement work at all with public key authentication again, as well as some test cleanups that reduce the overall surface area of old-style Deferred-returning tests that can be flaky and slow.

I’ve also landed a posix_spawnp-based spawnProcess implementation which speed up process spawning significantly; this can be as much as 3x faster if you do a lot of spawning of short-running processes.

I have a bunch of PRs in flight, too, including better annotations for FilePath Deferred, and IReactorProcess, as well as a fix for the aforementioned posix_spawnp implementation.

What are my plans for it?

A lot of the projects below use Twisted in some way, and I continue to maintain it for my own uses. My particular focus is in quality-of-life improvements; issues that someone starting out with a Twisted project will bump into and find confusing or difficult. I want it to be really easy to write applications with Twisted and I want to use my own experiences with it.

I also do code reviews of other folks’ contributions; we do still have over 100 open PRs right now.

DateType

What is it?

DateType is a workaround for a very specific bug in the way that the datetime standard library module deals with type composition: to wit, that datetime is a subclass of date but is not Liskov-substitutable for it. There are even #type:ignore comments in the standard library type stubs to work around this problem, because if you did this in your own code, it simply wouldn’t type-check.

What have I done with it recently?

I updated it a few months ago to expose DateTime and Time directly (as opposed to AwareDateTime and NaiveDateTime), so that users could specialize their own functions that took either naive or aware times without ugly and slightly-incorrect unions.

What are my plans for it?

This library is mostly done for the time being, but if I had to polish it a bit I’d probably do two things:

  1. a readthedocs page for nice documentation
  2. write a PEP to get this integrated into the standard library

Although the compatibility problems are obviously very tricky and a PEP would probably be controversial, this is ultimately a bug in the stdlib, and should be fixed upstream there.

Automat

What is it?

It’s a library to make deterministic finite-state automata easier to create and work with.

What have I done with it recently?

Back in the middle of last year, I opened a PR to create a new, completely different front-end API for state machine definition. Instead of something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
class MachineExample:
    machine = MethodicalMachine()

    @machine.state()
    def a_state(self): ...

    @machine.state()
    def other_state(self): ...

    @machine.input()
    def flip(self): ...

    @machine.output()
    def _do_flip(self): return ...

    on.upon(flip, enter=off, outputs=[_do_flip], collector=list)
    off.upon(flip, enter=on, outputs=[_do_flip], collector=list)

this branch lets you instead do something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
class MachineProtocol(Protocol):
    def flip(self) -> None: ...

class MachineCore: ...

def buildCore() -> MachineCore: ...
machine = TypicalBuilder(MachineProtocol, buildCore)

@machine.state()
class _OffState:
    @machine.handle(MachineProtocol.flip, enter=lambda: _OnState)
    def flip(self) -> None: ...

@machine.state()
class _OnState:
    @machine.handle(MachineProtocol.flip, enter=lambda: _OffState)
    def flip(self) -> None: ...

MachineImplementation = machine.buildClass()

In other words, it creates a state for every type, and type safety that much more cleanly expresses what methods can be called and by whom; no need to make everything private with tons of underscore-prefixed methods and attributes, since all the caller can see is “an implementation of MachineProtocol”; your state classes can otherwise just be normal classes, which do not require special logic to be instantiated if you want to use them directly.

Also, by making a state for every type, it’s a lot cleaner to express that certain methods require certain attributes, by simply making them available as attributes on that state and then requiring an argument of that state type; you don’t need to plot your way through the outputs generated in your state graph.

What are my plans for it?

I want to finish up dealing with some issues with that branch - particularly the ugly patterns for communicating portions of the state core to the caller and also the documentation; there are a lot of magic signatures which make sense in heavy usage but are a bit mysterious to understand while you’re getting started.

I’d also like the visualizer to work on it, which it doesn’t yet, because the visualizer cribs a bunch of state from MethodicalMachine when it should be working purely on core objects.

Secretly

What is it?

This is an attempt at a holistic, end-to-end secret management wrapper around Keyring. Whereas Keyring handles password storage, this handles the whole lifecycle of looking up the secret to see if it’s there, displaying UI to prompt the user (leveraging a pinentry program from GPG if available)

What have I done with it recently?

It’s been a long time since I touched it.

What are my plans for it?

  • Documentation. It’s totally undocumented.
  • It could be written to be a bit more abstract. It dates from a time before asyncio, so its current Twisted requirement for Deferred could be made into a generic Awaitable one.
  • Better platform support for Linux & Windows when GPG’s pinentry is not available.
  • Support for multiple accounts so that when the user is prompted for the relevant credential, they can store it.
  • Integration with 1Password via some of their many potentially relevant APIs.

Fritter

What is it?

Fritter is a frame-rate independent timer tree.

In the course of developing Twisted, I learned a lot about time and timers. LoopingCall encodes some of this knowledge, but it’s very tightly coupled to the somewhat limited IReactorTime API.

Also, LoopingCall was originally designed with the needs of media playback (particularly network streaming audio playback) in mind, but I have used it more for background maintenance tasks and for animations. Both of these things have requirements that LoopingCall makes awkward but FRITTer is designed to meet:

  1. At higher loads, surprising interactions can occur with the underlying priority queue implementation, and different algorithms may make a significant difference to performance. Fritter has a pluggable implementation of a priority queue and is carefully minimally coupled to it.

  2. Driver selection is a first-class part of the API, with an included, public “Memory” driver for testing, rather than LoopingCall’s “testing is at least possible.reactor attribute. This means that out of the box it supports both Twisted and asyncio, and can easily have other things added.

  3. The API is actually generic on what constitutes time itself, which means that you can use it for both short-term (i.e.: monotonic clock values as float-seconds) and long-term (civil times as timezone-aware datetime objects) recurring tasks. Recurrence rules can also be arbitrary functions.

  4. There is a recursive driver (this is the “tree” part) which both allows for:

    a. groups of timers which can be suspended and resumed together, and

    b. scaling of time, so that you can e.g. speed up or slow down the ticks for AIs, groups of animations, and so on, also in groups.

  5. The API is also generic on what constitutes work. This means that, for example, in a certain timer you can say “all work units scheduled on this scheduler, in addition to being callable, must also have an asJSON method”. And in fact that’s exactly what the longterm module in Fritter does.

I can neither confirm nor deny that this project was factored out of a game engine for a secret game project which does not appear on this list.

What have I done with it recently?

Besides realizing, in the course of writing this blog post, that its CI was failing its code quality static checks (oops), the last big change was the preliminary support for recursive timers and serialization.

What are my plans for it?

  • These haven’t been tested in anger yet and I want to actually use them in a larger project to make sure that they don’t have any necessary missing pieces.

  • Documentation.

Encrust

What is it?

I have written about Encrust quite recently so if you want to know about it, you should probably read that post. In brief, it is a code-shipping tool for py2app. It takes care of architecture-independence, code-signing, and notarization.

What have I done with it recently?

Wrote it. It’s brand new as of this month.

What are my plans for it?

I really want this project to go away as a tool with an independent existence. Either I want its lessons to be fully absorbed into Briefcase or perhaps py2app itself, or for it to become a library that those things call into to do its thing.

Various Small Mac Utilities

What is it?

  • QuickMacApp is a very small library for creating status-item “menu bar apps” in Python which don’t have much of a UI but want to run some Python code in the background and occasionally pop up a notification or ask the user a question or something. The idea is that if you have a utility that needs a minimal UI to just ask the user one or two things, you should be able to give it a GUI immediately, without thinking about it too much.
  • QuickMacHotkey this is a very minimal API to register hotkeys on macOS. this example is what comes up if you search the web for such a thing, but it hasn’t worked on a current Python for about 11 years. This isn’t the “right” way to do such a thing, since it provides no UI to set the shortcut, you’d have to hard-code it. But MASShortcut is now archived and I haven’t had the opportunity to investigate HotKey, so for the time being, it’s a handy thing, and totally adequate for the sort of quick-and-dirty applications you might make with QuickMacApp.
  • VEnvDotApp is a way of giving a virtualenv its own Info.plist and bundle ID, so that command-line python tools that just need to pop up a little mac GUI, like an alert or a notification, can do so with cross-platform tools without looking like it’s an app called “Python”, or in some cases breaking entirely.
  • MOPUp is a command-line updater for upstream Python.org macOS Python. For distributing third-party apps, Python.org’s version is really the one you want to use (it’s universal2, and it’s generally built with compiler options that make it a distributable thing itself) but updating it by downloading a .pkg file from a web browser is kind of annoying.

What have I done with it recently?

I’ve been releasing all these tools as they emerge and are factored out of other work, and they’re all fairly recent.

What are my plans for it?

I will continue to factor out any general-purpose tools from my platform-specific Python explorations — hopefully more Linux and Windows too, once I’ve got writing code for my own computer down, but most of the tools above are kind of “done” on their own, at the moment.

The two things that come to mind though are that QuickMacApp should have a way of owning the menubar sometimes (if you don’t have something like Bartender, menu-bar-status-item-only apps can look like they don’t do anything when you launch them), and that MOPUp should probably be upstreamed to python.org.

Pomodouroboros

What is it?

Pomodouroboros is a pomodoro timer with a highly opinionated take. It’s based on my own experience of ADHD time blindness, and is more like a therapeutic intervention for that specific condition than a typical “productivity” timer app.

In short, it has two important features that I have found lacking in other tools:

  1. A gigantic, absolutely impossible to ignore visual timer that presents a HUD overlay over your entire desktop. It remains low-opacity and static most of the time but pulses every 30 seconds to remind you that time is passing.
  2. Rather than requiring you to remember to set a timer before anything happens, it has an idea of “work hours” when you want to be time-sensitive and presents constant prompting to get started.

What have I done with it recently?

I’ve been working on it fairly consistently lately. The big things I’ve been doing have been:

  1. factoring things out of the Pomodouroboros-specific code and into QuickMacApp and Encrust.
  2. Porting the UI to the redesigned core of the application, which has been implemented and tested in platform-agnostic Python but does not have any UI yet.
  3. fully productionizing the build process and ensuring that Encrust is producing binary app bundles that people can use.

What are my plans for it?

In brief, “finish the app”. I want this to have its own website and find a life beyond the Python community, with people who just want a timer app and don’t care how it’s written. The top priority is to replace the current data model, which is to say the parts of the UI that set and evaluate timers and edit the list of upcoming timers (the timer countdown HUD UI itself is fine).

I also want to port it to other platforms, particularly desktop Linux, where I know there are many users interested in such a thing. I also want to do a CLI version for folks who live on the command line.

Finally: Pomodouroboros serves as a test-bed for a larger goal, which is that I want to make it easier for Python programmers, particularly beginners who are just getting into coding at all, to write code that not only interacts with their own computer, but that they can share with other users in a real way. As you can see with Encrust and other projects above, as much as I can I want my bumpy ride to production code to serve as trailblazing so that future travelers of this path find it as easy as possible.

And Here Is Where The CTA Goes

If this stuff sounds compelling, you can obviously sign up, that would be great. But also, if you’re just curious, go ahead and give some of these projects some stars on GitHub or just share this post. I’d also love to hear from you about any of this!

If a lot of people find this compelling, then pursuing these ideas will become a full-time job, but I’m pretty far from that threshold right now. In the meanwhile, I will also be doing a bit of consulting work.

I believe much of my upcoming month will be spoken for with contracting, although quite a bit of that work will also be open source maintenance, for which I am very grateful to my generous clients. Please do get in touch if you have something more specific you’d like me to work on, and you’d like to become one of those clients as well.


  1. Reasons which will have to remain mysterious until I can edit about 10,000 words of abstract, discursive philosophical rambling into something vaguely readable. 

  2. A strength which is common to many, indeed possibly most, people with ADHD. 

  3. While I want to give myself some leeway to try out ideas without necessarily finishing them, I do not want to start making commitments that I can’t keep. Particularly commitments that are tied to money!