Building And Distributing A macOS Application Written in Python

Even with all the great tools we have, getting a macOS application written in Python all the way to a production-ready build suitable for end users can involve a lot of esoteric trivia.

Why Bother With All This?

In other words: if you want to run on an Apple platform, why not just write everything in an Apple programming language, like Swift? If you need to ship to multiple platforms, you might have to rewrite it all anyway, so why not give up?

Despite the significant investment that platform vendors make in their tools, I fundamentally believe that the core logic in any software application ought to be where its most important value lies. For small, independent developers, having portable logic that can be faithfully replicated on every platform without massive rework might be tricky to get started with, but if you can’t do it, it may not be cost effective to support multiple platforms at all.

So, it makes sense for me to write my applications in Python to achieve this sort of portability, even though on each platform it’s going to be a little bit more of a hassle to get it all built and shipped since the default tools don’t account for the use of Python.

But how much more is “a little bit” more of a hassle? I’ve been slowly learning about the pipeline to ship independently-distributed1 macOS applications for the last few years, and I’ve encountered a ton of annoying roadblocks.

Didn’t You Do This Already? What’s New?

So nice of you to remember. Thanks for asking. While I’ve gotten this to mostly work in the past, some things have changed since then:

  • the notarization toolchain has been updated (altool is now notarytool),
  • I’ve had to ship libraries other than just PyGame,
  • Apple Silicon launched, necessitating another dimension of build complexity to account for multiple architectures,
  • Perhaps most significantly, I have written a tool that attempts to encode as much of this knowledge as possible, Encrust, which I have put on PyPI and GitHub. If this is of interest to you, I would encourage you to file bugs on it, and hopefully add in more corner cases which I have missed.

I’ve also recently shipped my first build of an end-user application that successfully launches on both Apple Silicon and Intel macs, so here is a brief summary of the hoops I needed to jump through, from the beginning, in order to make everything work.

Wait did you say you wrote a tool? Is this fixed, then?

Encrust is, I hope, a temporary stopgap on the way to a much better comprehensive solution.

Specifically, I believe that Briefcase is a much more holistic solution to the general problem being described here, but it doesn’t suit my very specific needs right now4, and it doesn’t address a couple of minor points that I was running into here.

It is mostly glue that is shelling out to other tools that already solve portions of the problem, even when better APIs exist. It addresses three very specific layers of complexity:

  1. It enforces architecture independence, so that your app built on an M1 machine will still actually run on about half of the macs remaining out there2.
  2. It remembers tricky nuances of the notarization submission process, such as the highly specific way I need to generate my zip files to avoid mysterious notarization rejections3.
  3. Providing a common and central way to store the configuration for these things across repositories so I don’t need to repeat this process and copy/paste a shell script every time I make a tiny new application.

It only works on Apple Silicon macs, because I didn’t bother to figure out how pip actually determines which architecture to download wheels for.

As such, unfortunately, Encrust is mostly a place for other people who have already solved this problem to collaborate to centralize this sort of knowledge and share ideas about where this code should ultimately go, rather than a tool for users trying to get started with shipping an app.

Open Offer

That said:

  1. I want to help Python developers ship their Python apps to users who are not also Python developers.
  2. macOS is an even trickier platform to do that on than most.
  3. It’s now easy for me to sign, notarize, and release new applications reliably

Therefore:

If you have an open source Python application that runs on macOS5 but can’t ship to macOS — either because:

  1. you’ve gotten stuck on one of the roadblocks that this post describes,
  2. you don’t have $100 to give to Apple, or because
  3. the app is using a cross-platform toolkit that should work just fine and you don’t have access to a mac at all, then

Send me an email and I’ll sign and post your releases.

What’s this post about, then?

People still frequently complain that “Python packaging” is really bad. And I’m on record that packaging Python (in the sense of “code”) for Python (in the sense of “deployment platform”) is actually kind of fine right now; if what you’re trying to get to is a package that can be pip installed, you can have a reasonably good experience modulo a few small onboarding hiccups that are well-understood in the community and fairly easy to overcome.

However, it’s still unfortunately hard to get Python code into the hands of users who are not also Python programmers with their own development environments.

My goal here is to document the difficulties themselves to try to provide a snapshot of what happens if you try to get started from scratch today. I think it is useful to record all the snags and inscrutable error messages that you will hit in a row, so we can see what the experience really feels like.

I hope that everyone will find it entertaining.

  • Other Mac python programmers might find pieces of trivia useful, and
  • Linux users will have fun making fun of the hoops we have to jump through on Apple platforms,

but the main audience is the maintainers of tools like Briefcase and py2app to evaluate the new-user experience holistically, and to see how much the use of their tools feels like this. This necessarily includes the parts of the process that are not actually packaging.

This is why I’m starting from the beginning again, and going through all the stuff that I’ve discussed in previous posts again, to present the whole experience.

Here Goes

So, with no further ado, here is a non-exhaustive list of frustrations that I have encountered in this process:

  • Okay. Time to get started. How do I display a GUI at all? Nothing happens when I call some nominally GUI API. Oops: I need my app to exist in an app bundle, which means I need to have a framework build. Time to throw those partially-broken pyenv pythons in the trash, and carefully sidestep around Homebrew; best to use the official Python.org from here on out.
  • Bonus Frustration since I’m using AppKit directly: why is my app segfaulting all the time? Oh, target is a weak reference in objective C, so if I make a window and put a button in it that points at a Python object, the Python interpreter deallocates it immediately because only the window (which is “nothing” as it’s a weakref) is referring to it. I need to start stuffing every Python object that talks to a UI element like a window or a button into a global list, or manually calling .retain() on all of them and hoping I don’t leak memory.
  • Everything seems to be using the default Python Launcher icon, and the app menu says “Python”. That wouldn’t look too good to end users. I should probably have my own app.
  • I’ll skip the part here where the author of a new application might have to investigate py2app, briefcase, pyoxidizer, and pyinstaller separately and try to figure out which one works the best right now. As I said above, I started with py2app and I’m stubborn to a fault, so that is the one I’m going to make work.
  • Now I need to set up py2app. Oops, I can’t use pyproject.toml any more, time to go back to setup.py.
  • Now I built it and the the app is crashing on startup when I click on it. I can’t see a traceback anywhere, so I guess I need to do something in the console.
    • Wow; the console is an unusable flood of useless garbage. Forget that.
    • I guess I need to run it in the terminal somehow. After some googling I figure out it’s ./dist/MyApp.app/Contents/Resources/MacOS/MyApp. Aha, okay, I can see the traceback now, and it’s … an import error?
    • Ugh, py2app isn’t actually including all of my code, it’s using some magic to figure out which modules are actually used, but it’s doing it by traversing import statements, which means I need to put a bunch of fake static import statements for everything that is used indirectly at the top of my app’s main script so that it gets found by the build. I experimentally discover a half a dozen things that are dynamically imported inside libraries that I use and jam them all in there.
  • Okay. Now at least it starts up. The blank app icon is uninspiring, though, time to actually get my own icon in there. Cool, I’ll make an icon in my favorite image editor, and save it as... icons must be PNGs, right? Uhh... no, looks like they have to be .icns files. But luckily I can convert the PNG I saved with a simple 12-line shell script that invokes sips and iconutil6.

At this point I have an app bundle which kinda works. But in order to run on anyone else’s computer, I have to code-sign it.

  • In order to code-sign anything, I have to have an account with Apple that costs $99 per year, on developer.apple.com.
  • The easiest way to get these certificates is to log in to Xcode itself. There’s a web portal too but using it appears to involve a lot more manual management of key material, so, no thanks. This requires the full-fat Xcode.app though, not just the command-line tools that come down when I run xcode-select --install, so, time to wait for an 11GB download.
  • Oops, I made the wrong certificate type. Apparently the only right answer here is a “Developer ID Application” certificate.
  • Now that I’ve logged in to Xcode to get the certificate, I need to figure out how to tell my command-line tools about it (for starters, “codesign”). Looks like I need to run security find-identity -v -p codesigning.
  • Time to sign the application’s code.
    • The codesign tool has a --deep option which can sign the whole bundle. Great!
    • Except, that doesn’t work, because Python ships shared libraries in locations that macOS doesn’t automatically expect, so I have to manually locate those files and sign them, invoking codesign once for each.
    • Also, --deep is deprecated. There’s no replacement.
    • Logically, it seems like I still need --deep, because it does some poorly-explained stuff with non-code resource files that maybe doesn’t happen properly if I don’t? Oh well. Let's drop the option and hope for the best.8
    • With a few heuristics I think we can find all the relevant files with a little script7.

Now my app bundle is signed! Hooray. 12 years ago, I’d be all set. But today I need some additional steps.

  • After I sign my app, Apple needs to sign my app (to indicate they’ve checked it for malware), which is called “notarization”.
    • In order to be eligible for notarization, I can’t just code-sign my app. I have to code-sign it with entitlements.
    • Also I can’t just code sign it with entitlements, I have to sign it with the hardened runtime, or it fails notarization.
    • Oops, out of the box, the hardened runtime is incompatible with a bunch of stuff in Python, including cffi and ctypes, because nobody has implemented support for MAP_JIT yet, so it crashes at startup. After some thrashing around I discover that I need a legacy “allow unsigned executable memory” entitlement. I can’t avoid importing this because a bunch of things in py2app’s bootstrapping code import things that use ctypes, and probably a bunch of packages which I’m definitely going to need, like cryptography require cffi directly anyway.
    • In order to set up notarization external to Xcode, I need to create an App Password which is set up at appleid.apple.com, not the developer portal.
    • Bonus Frustration since I’ve been doing this for a few years: Originally this used to be even more annoying as I needed to wait for an email (with altool), and so I couldn’t script it directly. Now, at least, the new notarytool (which will shortly be mandatory) has a --wait flag.
    • Although the tool is documented under man notarytool, I actually have to run it as xcrun notarytool, even though codesign can be run either directly or via xcrun codesign.
    • Great, we’re ready to zip up our app and submit to Apple. Wait, they’re rejecting it? Why???
    • Aah, I need to manually copy and paste the UUID in the console output of xcrun notarytool submit into xcrun notarytool log to get some JSON that has some error messages embedded in it.
    • Oh. The bundle contains internal symlinks, so when I zipped it without the -y option, I got a corrupt archive.
    • Great, resubmitted with zip -y.
    • Oops, just kidding, that only works sometimes. Later, a different submission with a different hash will fail, and I’ll learn that the correct command line is actually ditto -c -k --sequesterRsrc --keepParent MyApp.app MyApp.app.zip.
      • Note that, for extra entertainment value, the position of the archive itself and directory are reversed on the command line from zip (and tar, and every other archive tool).
    • notarytool doesn’t record anything in my app though; it puts the “notarization ticket” on Apple's servers. Apparently, I still need to run stapler for users to be able to launch it while those servers are inaccessible, like, for example, if they’re offline.
    • Oops, not stapler. xcrun stapler. Whatever.
    • Except notarytool operates on a zip archive, but stapler operates on an app bundle. So we have to save the original app bundle, run stapler on it, then re-archive the whole thing into a new archive.

Hooray! Time to release my great app!

  • Whoops, just got a bug report that it crashes immediately on every Intel mac. What’s going on?
  • Turns out I’m using a library whose authors distribute both aarch64 and x86_64 wheels; pip will prefer single-architecture wheels even if universal2 wheels are also available, so I’ve got to somehow get fat binaries put together. Am I going to have to build a huge pile of C code by myself? I thought all these Python hassles would at least let me avoid the C hassles!
  • Whew, okay, no need for that: there’s an amazing Swiss-army knife for macOS binary wheels, called delocate that includes a delocate-fuse tool that can fuse two wheels together. So I just need to figure out which binaries are the wrong architecture and somehow install my fixed/fused wheels before building my app with py2app.

    • except, oops, this tool just rewrites the file in-place without even changing its name, so I have to write some janky shell scripts to do the reinstallation. Ugh.
  • OK now that all that is in place, I just need to re-do all the steps:

    • universal2-ize my virtualenv!
    • build!
    • sign!
    • archive!
    • notarize!
    • wait!!!
    • staple!
    • re-archive!
    • upload!

And we have an application bundle we can ship to users.

It’s just that easy.

As long as I don’t need sandboxing or Mac App Store distribution, of course. That’s a challenge for another day.


So, that was terrible. But what should be happening here?

Some of this is impossible to simplify beyond a certain point - many of the things above are not really about Python, but are about distribution requirements for macOS specifically, and we in the Python community can’t affect operating system vendors’ tooling.

What we can do is build tools that produce clear guidance on what step is required next, handle edge cases on their own, and generally guide users through these complex processes without requiring them to hit weird binary-format or cryptographic-signing errors on their own with no explanation of what to do next.

I do not think that documentation is the answer here. The necessary steps should be discoverable. If you need to go to a website, the tool should use the webbrowser module to open a website. If you need to launch an app, the tool should launch that app.

With Encrust, I am hoping to generalize the solutions that I found while working on this for this one specific slice of the app distribution pipeline — i.e. a macOS desktop application desktop, as distributed independently and not through the mac app store — but other platforms will need the same treatment.

However, even without really changing py2app or any of the existing tooling, we could imagine a tool that would interactively prompt the user for each manual step, automate as much of it as possible, verify that it was performed correctly, and give comprehensible error messages if it was not.

For a lot of users, this full code-signing journey may not be necessary; if you just want to run your code on one or two friends’ computers, telling them to right click, go to ‘open’ and enter their password is not too bad. But it may not even be clear to them what the trade-off is, exactly; it looks like the app is just broken when you download it. The app build pipeline should tell you what the limitations are.

Other parts of this just need bug-fixes to address. py2app specifically, for example, could have a better self-test for its module-collecting behavior, launching an app to make sure it didn’t leave anything out.

Interactive prompts to set up a Homebrew tap, or a Flatpak build, or a Microsoft Store Metro app, might be similarly useful. These all have outside-of-Python required manual steps, and all of them are also amenable to at least partial automation.


Thanks to my patrons for supporting this sort of work, including development of Encrust, of Pomodouroboros, of posts like this one and of that offer to sign other people’s apps. If you think this sort of stuff is worthwhile, you might want to consider supporting my work as well.


  1. I am not even going to try to describe building a sandboxed, app-store ready application yet. 

  2. At least according to the Steam Hardware Survey, which as of this writing in March of 2023 pegs the current user-base at 54% apple silicon and 46% Intel. The last version I can convince the Internet Archive to give me, from December of 2022, has it closer to 51%/49%, which suggests a transition rate of 1% per month. I suspect that this is pretty generous to Apple Silicon as Steam users would tend to be earlier adopters and more sensitive to performance, but mostly I just don’t have any other source of data. 

  3. It is truly remarkable how bad the error reporting from the notarization service is. There are dozens of articles and forum posts around the web like this one where someone independently discovers this failure mode after successfully notarizing a dozen or so binaries and then suddenly being unable to do so any more because one of the bytes in the signature is suddenly not valid UTF-8 or something. 

  4. A lot of this is probably historical baggage; I started with py2app in 2008 or so, and I have been working on these apps in fits and starts for… ugh… 15 years. At some point when things are humming along and there are actual users, a more comprehensive retrofit of the build process might make sense but right now I just want to stop thinking about this

  5. If your application isn’t open source, or if it requires some porting work, I’m also available for light contract work, but it might take a while to get on my schedule. Feel free to reach out as well, but I am not looking to spend a lot of time doing porting work. 

  6. I find this particular detail interesting; it speaks to the complexity and depth of this problem space that this has been a known issue for several years in Briefcase, but there’s just so much other stuff to handle in the release pipeline that it remains open. 

  7. I forgot both .a files and the py2app-included python executable itself here, and had to discover that gap when I signed a different app where that made a difference. 

  8. Thus far, it seems to be working. 

Data Classification

Does Python still have a need for class without @dataclass?

Is there a place for non-@dataclass classes in Python any more?

I have previously — and somewhat famously — written favorably about @dataclass’s venerable progenitor, attrs, and how you should use it for pretty much everything.

At the time, attrs was an additional dependency, a piece of technology that you could bolt on to your Python stack to make your particular code better. While I advocated for it strongly, there are all the usual implicit reasons against using a new thing. It was an additional dependency, it might not interoperate with other convenience mechanisms for type declarations that you were already using (i.e. NamedTuple), it might look weird to other Python programmers familiar with existing tools, and so on. I don’t think that any of these were good counterpoints, but there was nevertheless a robust discussion to be had in addressing them all.

But for many years now, dataclasses have been — and currently are — built in to the language. They are increasingly integrated to the toolchain at a deep level that is difficult for application code — or even other specialized tools — to replicate. Everybody knows what they are. Few or none of those reasons apply any longer.

For example, classes defined with @dataclass are now optimized as a C structure might be when you compile them with mypyc, a trick that is extremely useful in some circumstances, which even attrs itself now has trouble keeping up with.

This all raises the question for me: beyond backwards compatibility, is there any point to having non-@dataclass classes any more? Is there any remaining justification for writing them in new code?

Consider my original example, translated from attrs to dataclasses. First, the non-dataclass version:

1
2
3
4
5
class Point3D:
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

And now the dataclass one:

1
2
3
4
5
6
7
from dataclasses import dataclass

@dataclass
class Point3D:
    x: int
    y: int
    z: int

Many of my original points still stand. It’s still less repetitive. In fewer characters, we’ve expressed considerably more information, and we get more functionality (repr, sorting, hashing, etc). There doesn’t seem to be much of a downside besides the strictness of the types, and if typing.Any were a builtin, x: any would be fine for those who don’t want to unduly constrain their code.

The one real downside of the latter over the former right now is the need for an import. Which, at this point, just seems… confusing? Wouldn’t it be nicer to be able to just write this:

1
2
3
4
class Point3D:
    x: int
    y: int
    z: int

and not need to faff around with decorator semantics and fudging the difference between Mypy (or Pyright or Pyre) type-check-time and Mypyc or Cython compile time? Or even better, to not need to explain the complexity of all these weird little distinctions to new learners of Python, and to have to cover import before class?

These tools all already treat the @dataclass decorator as a totally special language construct, not really like a decorator at all, so to really explore it you have to explain a special case and then a special case of a special case. The extension hook for this special case of the special case notwithstanding.

If we didn’t want any new syntax, we would need a from __future__ import dataclassification or some such for a while, but this doesn’t seem like an impossible bar to clear.


There are still some folks who don’t like type annotations at all, and there’s still the possibility of awkward implicit changes in meaning when transplanting code from a place with dataclassification enabled to one without, so perhaps an entirely new unambiguous syntax could be provided. One that more closely mirrors the meaning of parentheses in def, moving inheritance (a feature which, whether you like it or not, is clearly far less central to class definitions than ‘what fields do I have’) off to its own part of the syntax:

1
2
3
data Point3D(x: int, y: int, z: int) from Vector:
    def method(self):
        ...

which, for the “I don’t like types” contingent, could reduce to this in the minimal case:

1
2
data Point3D(x, y, z):
    pass

Just thinking pedagogically, I find it super compelling to imagine moving from teaching def foo(x, y, z):... to data Foo(x, y, z):... as opposed to @dataclass class Foo: x: int....

I don’t have any desire for semantic changes to accompany this, just to make it possible for newcomers to ignore the circuitous historical route of the @dataclass syntax and get straight into defining their own types with legible reprs from the very beginning of their Python journey.

(And make it possible for me to skip a couple of lines of boilerplate in short examples, as a bonus.)


I’m curious to know what y’all think, though. Shoot me an email or a toot and let me know.

In particular:

  1. Do you think there’s some reason I’m missing why Python’s current method for defining classes via a bunch of dunder methods is still better than dataclasses, or should stick around into the future for reasons beyond “compatibility”?
  2. Do you think “compatibility” is sufficient reason to keep the syntax the way it is forever, and I’m underestimating the cost of adding a keyword like this?
  3. If you do think that a change should be made, would you prefer:
    1. changing the meaning of class itself via a __future__ import,
    2. a new data keyword like the one I’ve proposed,
    3. a new keyword that functions exactly like the one I have proposed but really want to bikeshed the word data a bunch,
    4. something more incremental like just putting dataclass and field in builtins,
    5. or an option I haven’t even contemplated here?

If I find I’m not alone in this perhaps I will wander over to the Python discussion boards to have a more substantive conversation...


Thank you to my patrons who are helping me while I try to turn… whatever this is… along with open source maintenance and application development, into a real job. Do you want to see me pursue ideas like this one further? If so, you can support my work as a sponsor!

A Very Silly Program

This program will not work on your computer.

One of the persistently lesser-known symptoms of ADHD is hyperfocus. It is sometimes quasi-accurately described as a “superpower”1 2, which it can be. In the right conditions, hyperfocus is the ability to effortlessly maintain a singular locus of attention for far longer than a neurotypical person would be able to.

However, as a general rule, it would be more accurate to characterize hyperfocus not as an “ability to focus on X” but rather as “an inability to focus on anything other than X”. Sometimes hyperfocus comes on and it just digs its claws into you and won’t let go until you can achieve some kind of closure.

Recently, the X I could absolutely not stop focusing on — for days at a time — was this extremely annoying picture:

chroma subsampling carnage

Which lead to me writing the silliest computer program I have written in quite some time.


You see, for some reason, macOS seems to prefer YUV422 chroma subsampling3 on external displays, even when the bitrate of the connection and selected refresh rate support RGB.4 Lots of people have been trying to address this for a literal decade5 6 7 8 9 10 11, and the problem has gotten worse with Apple Silicon, where the operating system no longer even supports the EDID-override functionality available on every other PC operating system that supports plugging in a monitor.

In brief, this means that every time I unplug my MacBook from its dock and plug it back in more than 5 minutes later, its color accuracy is destroyed and red or blue text on certain backgrounds looks like that mangled mess in the picture above. Worse, while the color distinction is definitely noticeable, it’s so subtle that it’s like my display is constantly gaslighting me. I can almost hear it taunting me:

Magenta? Yeah, magenta always looked like this. Maybe it’s the ambient lighting in this room. You don’t even have a monitor hood. Remember how you had to use one of those for print design validation? Why would you expect it to always look the same without one?

Still, I’m one of the luckier people with this problem, because I can seem to force RGB / 444 color format on my display just by leaving the display at 120Hz rather than 144, then toggling HDR on and then off again. At least I don’t need to plug in the display via multiple HDMI and displayport cables and go into the OSD every time. However, there is no API to adjust, or even discover the chroma format of your connected display’s link, and even the accessibility features that supposedly let you drive GUIs are broken in the system settings “Displays” panel12, so you have to do it by sending synthetic keystrokes and hoping you can tab-focus your way to the right place.

Anyway, this is a program which will be useless to anyone else as-is, but if someone else is struggling with the absolute inability to stop fiddling with the OS to try and get colors to look correct on a particular external display, by default, all the time, maybe you could do something to hack on this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import os
from Quartz import CGDisplayRegisterReconfigurationCallback, kCGDisplaySetMainFlag, kCGDisplayBeginConfigurationFlag
from ColorSync import CGDisplayCreateUUIDFromDisplayID
from CoreFoundation import CFUUIDCreateString
from AppKit import NSApplicationMain, NSApplicationActivationPolicyAccessory, NSApplication

NSApplication.sharedApplication().setActivationPolicy_(NSApplicationActivationPolicyAccessory)

CGDirectDisplayID = int
CGDisplayChangeSummaryFlags = int

MY_EXTERNAL_ULTRAWIDE = '48CEABD9-3824-4674-9269-60D1696F0916'
MY_INTERNAL_DISPLAY = '37D8832A-2D66-02CA-B9F7-8F30A301B230'

def cb(display: CGDirectDisplayID, flags: CGDisplayChangeSummaryFlags, userInfo: object) -> None:
    if flags & kCGDisplayBeginConfigurationFlag:
        return
    if flags & kCGDisplaySetMainFlag:
        displayUuid = CGDisplayCreateUUIDFromDisplayID(display)
        uuidString = CFUUIDCreateString(None, displayUuid)
        print(uuidString, "became the main display")
        if uuidString == MY_EXTERNAL_ULTRAWIDE:
            print("toggling HDR to attempt to clean up subsampling")
            os.system("/Users/glyph/.local/bin/desubsample")
            print("HDR toggled.")

print("registered", CGDisplayRegisterReconfigurationCallback(cb, None))

NSApplicationMain([])

and the linked desubsample is this atrocity, which I substantially cribbed from this helpful example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#!/usr/bin/osascript

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit"
use scripting additions

tell application "System Settings"
    quit
    delay 1
    activate
    current application's NSWorkspace's sharedWorkspace()'s openURL:(current application's NSURL's URLWithString:"x-apple.systempreferences:com.apple.Displays-Settings.extension")
    delay 0.5

    tell application "System Events"
    tell process "System Settings"
        key code 48
        key code 48
        key code 48
            delay 0.5
        key code 49
        delay 0.5
        -- activate hdr on left monitor

        set hdr to checkbox 1 of group 3 of scroll area 2 of ¬
                group 1 of group 2 of splitter group 1 of group 1 of ¬
                window "Displays"
        tell hdr
                click it
                delay 1.0
                if value is 1
                    click it
                end if
        end tell

    end tell
    end tell
    quit
end tell

This ridiculous little pair of programs does it automatically, so whenever I reconnect my MacBook to my desktop dock at home, it faffs around with clicking the HDR button for me every time. I am leaving it running in a background tmux session so — hopefully — I can finally stop thinking about this.

Potato Programming

One potato, two potato, three potato, four…

One potato, two potato, three potato, four
Five potato, six potato, seven potato, more.

Traditional Children’s Counting Rhyme

Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

Knuth, Donald
“Structured Programming with go to statements”
Computing Surveys, Vol. 6, No. 4, December 1974
(p. 268)
(Emphasis mine)

Knuth’s admonition about premature optimization is such a cliché among software developers at this point that even the correction to include the full context of the quote is itself a a cliché.

Still, it’s a cliché for a reason: the speed at which software can be written is in tension — if not necessarily in conflict — with the speed at which it executes. As Nelson Elhage has explained, software can be qualitatively worse when it is slow, but spending time optimizing an algorithm before getting any feedback from users or profiling the system as a whole can lead one down many blind alleys of wasted effort.

In that same essay, Nelson further elaborates that performant foundations simplify architecture1. He then follows up with several bits of architectural advice that is highly specific to parsing—compilers and type-checkers specifically—which, while good, is hard to generalize beyond “optimizing performance early can also be good”.

So, here I will endeavor to generalize that advice. How does one provide a performant architectural foundation without necessarily wasting a lot of time on early micro-optimization?

Enter The Potato

Many years before Nelson wrote his excellent aforementioned essay, my father coined a related term: “Potato Programming”.

In modern vernacular, a potato is very slow hardware, and “potato programming” is the software equivalent of the same.

The term comes from the rhyme that opened this essay, and is meant to evoke a slow, childlike counting of individual elements as an algorithm operates upon them. it is an unfortunately quite common software-architectural idiom whereby interfaces are provided in terms of scalar values. In other words, APIs that require you to use for loops or other forms of explicit, individual, non-parallelized iteration. But this is all very abstract; an example might help.

For a generic business-logic example, let’s consider the problem of monthly recurring billing. Every month, we pull in the list of all of all subscriptions to our service, and we bill them.

Since our hypothetical company has an account-management team that owns the UI which updates subscriptions and a billing backend team that writes code to interface with 3rd-party payment providers, we’ll create 2 backends, here represented by some Protocols.

Finally, we’ll have an orchestration layer that puts them together to actually run the billing. I will use async to indicate which things require a network round trip:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
class SubscriptionService(Protocol):
    async def all_subscriptions(self) -> AsyncIterable[Subscription]:
        ...

class Subscription(Protocol):
    account_id: str
    to_charge_per_month: money

class BillingService(Protocol):
    async def bill_amount(self, account_id: str, amount: money) -> None:
        ...

To many readers, this may look like an entirely reasonable interface specification; indeed, it looks like a lot of real, public-facing “REST” APIs. An equally apparently-reasonable implementation of our orchestration between them might look like this:

1
2
3
async def billing(s: SubscriptionService, b: BillingService) -> None:
    async for sub in s.all_subscriptions():
        await b.bill_amount(sub.account_id, sub.to_charge_per_month)

This is, however, just about the slowest implementation of this functionality that it’s possible to implement. So, this is the bad version. Let’s talk about the good version: no-tato programming, if you will. But first, some backstory.

Some Backstory

My father began his career as an APL programmer, and one of the key insights he took away from APL’s architecture is that, as he puts it:

Computers like to do things over and over again. They like to do things on arrays. They don’t want to do things on scalars. So, in fact, it’s not possible to write a program that only does things on a scalar. [...] You can’t have an ‘integer’ in APL, you can only have an ‘array of integers’. There’s no ‘loop’s, there’s no ‘map’s.

APL, like Python2, is typically executed via an interpreter. Which means, like Python, execution of basic operations like calling functions can be quite slow. However, unlike Python, its pervasive reliance upon arrays meant that almost all of its operations could be safely parallelized, and would only get more and more efficient as more and more parallel hardware was developed.

I said ‘unlike Python’ there, but in fact, my father first related this concept to me regarding a part of the Python ecosystem which follows APL’s design idiom: NumPy. NumPy takes a similar approach: it cannot itself do anything to speed up Python’s fundamental interpreted execution speed3, but it can move the intensive numerical operations that it implements into operations on arrays, rather than operations on individual objects, whether numbers or not.

The performance difference involved in these two styles is not small. Consider this case study which shows a 5828% improvement4 when taking an algorithm from idiomatic pure Python to NumPy.

This idiom is also more or less how GPU programming works. GPUs cannot operate on individual values. You submit a program5 to the GPU, as well as a large array of data6, and the GPU executes the program on that data in parallel across hundreds of tiny cores. Submitting individual values for the GPU to work on would actually be much slower than just doing the work on the CPU directly, due to the bus latency involved to transfer the data back and forth.

Back from the Backstory

This is all interesting for a class of numerical software — and indeeed it works very well there — but it may seem a bit abstract to web backend developers just trying to glue together some internal microservice APIs, or indeed most app developers who aren’t working in those specialized fields. It’s not like Stripe is going to let you run their payment service on your GPU.

However, the lesson generalizes quite well: anywhere you see an API defined in terms of one-potato, two-potato iteration, ask yourself: “how can this be turned into an array”? Let’s go back to our example.

The simplest change that we can make, as a consumer of these potato-shaped APIs, is to submit them in parallel. So if we have to do the optimization in the orchestration layer, we might get something more like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from asyncio import Semaphore, AbstractEventLoop

async def one_bill(
    loop: AbstractEventLoop,
    sem: Semaphore,
    sub: Subscription,
    b: BillingService,
) -> None:
    await sem.acquire()
    async def work() -> None:
        try:
            await b.bill_amount(sub.account_id, sub.to_charge_per_month)
        finally:
            sem.release()
    loop.create_task(work)

async def billing(
    loop: AbstractEventLoop,
    s: SubscriptionService,
    b: BillingService,
    batch_size: int,
) -> None:
    sem = Semaphore(batch_size)
    async for sub in s.all_subscriptions():
        await one_bill(loop, sem, sub, b)

This is an improvement, but it’s a bit of a brute-force solution; a multipotato, if you will. We’ve moved the work to the billing service faster, but it still has to do just as much work. Maybe even more work, because now it’s potentially got a lot more lock-contention on its end. And we’re still waiting for the Subscription objects to dribble out of the SubscriptionService potentially one request/response at a time.

In other words, we have used network concurrency as a hack to simulate a performant design. But the back end that we have been given here is not actually optimizable; we do not have a performant foundation. As you can see, we have even had to change our local architecture a little bit here, to include a loop parameter and a batch_size which we had not previously contemplated.

A better-designed interface in the first place would look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
class SubscriptionService(Protocol):
    async def all_subscriptions(
        self, batch_size: int,
    ) -> AsyncIterable[Sequence[Subscription]]:
        ...

class Subscription(Protocol):
    account_id: str
    to_charge_per_month: money

@dataclass
class BillingRequest:
    account_id: str
    amount: money

class BillingService(Protocol):
    async def submit_bills(
        self,
        bills: Sequence[BillingRequest],
    ) -> None:
        ...

Superficially, the implementation here looks slightly more awkward than our naive first attempt:

1
2
3
4
5
6
7
8
async def billing(s: SubscriptionService, b: BillingService, batch_size: int) -> None:
    async for sub_batch in s.all_subscriptions(batch_size):
        await b.submit_bills(
            [
                BillingRequest(sub.account_id, sub.to_charge_per_month)
                for sub in sub_batch
            ]
        )

However, while the implementation with batching in the backend is approximately as performant as our parallel orchestration implementation, backend batching has a number of advantages over parallel orchestration.

First, backend batching has less internal complexity; no need to have a Semaphore in the orchestration layer, or to create tasks on an event loop. There’s less surface area here for bugs.

Second, and more importantly: backend batching permits for future optimizations within the backend services, which are much closer to the relevant data and can achieve more substantial gains than we can as a client without knowledge of their implementation.

There are many ways this might manifest, but consider that each of these services has their own database, and have got to submit queries and execute transactions on those databases.

In the subscription service, it’s faster to run a single SELECT statement that returns a bunch of results than to select a single result at a time. On the billing service’s end, it’s much faster to issue a single INSERT or UPDATE and then COMMIT for N records at once than to concurrently issue a ton of potentially related modifications in separate transactions.

Potato No Mo

The initial implementation within each of these backends can be as naive and slow as necessary to achieve an MVP. You can do a SELECT … LIMIT 1 internally, if that’s easier, and performance is not important at first. There can be a mountain of potatoes hidden behind the veil of that batched list. In this way, you can avoid the potential trap of premature optimization. Maybe this is a terrible factoring of services for your application in the first place; best to have that prototype in place and functioning quickly so that you can throw it out faster!

However, by initially designing an interface based on lists of things rather than individual things, it’s much easier to hide irrelevant implementation details from the client, and to achieve meaningful improvements when optimizing.

Acknowledgements

This is the first post supported by my patrons, with a topic suggested by a member of my Patreon!


  1. It’s a really good essay, you should read it. 

  2. Yes, I know it’s actually bytecode compiled and then run on a custom interpreting VM, but for the purposes of comparing these performance characteristics “interpreted” is a more accurate approximation. Don’t @ me. 

  3. Although, thankfully, a lot of folks are now working very hard on that problem. 

  4. No, not a typo, that’s a 4-digit improvement. 

  5. Typically called a “shader” due to its origins in graphically shading polygons. 

  6. The data may rerepresenting vertices in a 3-D mesh, pixels in texture data, or, in the case of general-purpose GPU programming, “just a bunch of floating-point numbers”. 

Super Swing Districts

Donate now to save democracy. Please. I like democracy.

In my corner of the social graph, when we talk about politics today, we tend to use a lot of moralizing language. A lot of emotive language. And that makes sense; overt fascist are repeating the strategy of using the right of trans people to, like, be alive, as a wedge issue to escalate to full-blown eugenics and antisemitism. There’s a lot of moral stuff and a lot of emotional stuff happening there.

But when we get down to it, politics is a highly technical discipline that requires a lot of work. You don’t need to just have the right opinion, you have to actually do a lot of math to figure out efficient ways to deploy resources, effective strategies to convince the undecided and to command the attention of the disengaged. It’s also adversarial: the bad guys are trying to do the same thing, so if you do find some efficient way to campaign, they will soon find out and try to dismantle it.

So while we might talk abstractly about “doing the work”, a lot of the work is tedious and difficult analysis of a lot of very confusing numbers. Not to mention the fact that it requires maintaining the tenacious mindset of a happy Sisyphus due to its adversarial nature. To be frank, I’m not great at either of those things.

Luckily, my uncle is. He is a professor of political science who — beyond the obvious familial bias I might have — I tend to think is a really smart guy with a lot of good ideas. More importantly, however, is that he does do “the work” I’m talking about here.

So here is some of that work: SuperSwingDistricts.org. This is a slate of democratic downballot candidates for office across the USA who need your support right now. Specifically it is a carefully curated slate to maximize spend efficiency via the reverse-coattails effect, multiplied by finding the areas where there are the most overlapping high-leverage elections. You can read more about the specifics on the website, and the specifics of the vetting of the candidates bona-fides, but you can also just take my word for it and Donate Now via ActBlue. Just like... gobs of money.

Political fundraising is not really my wheelhouse, and I am not that comfortable doing it. I hope that we can stop this “democracy” machine from constantly falling apart all the time so I can work on fixing the other broken systems in my life like Python-langauge native application packaging for various platforms. But this one is really, really important. Many of these candidates are in pivotal positions that will help prevent authoritarians from seizing the actual physical mechanisms of elections themselves, and attempting a more successful coup in 2024.

So: donate now.