Stop Writing `__init__` Methods

YEARS OF DATACLASSES yet NO REAL-WORLD USE FOUND for overriding special methods just so you can have some attributes.

The History

Before dataclasses were added to Python in version 3.7 — in June of 2018 — the __init__ special method had an important use. If you had a class representing a data structure — for example a 2DCoordinate, with x and y attributes — you would want to be able to construct it as 2DCoordinate(x=1, y=2), which would require you to add an __init__ method with x and y parameters.

The other options available at the time all had pretty bad problems:

  1. You could remove 2DCoordinate from your public API and instead expose a make_2d_coordinate function and make it non-importable, but then how would you document your return or parameter types?
  2. You could document the x and y attributes and make the user assign each one themselves, but then 2DCoordinate() would return an invalid object.
  3. You could default your coordinates to 0 with class attributes, and while that would fix the problem with option 2, this would now require all 2DCoordinate objects to be not just mutable, but mutated at every call site.
  4. You could fix the problems with option 1 by adding a new abstract class that you could expose in your public API, but this would explode the complexity of every new public class, no matter how simple. To make matters worse, typing.Protocol didn’t even arrive until Python 3.8, so, in the pre-3.7 world this would condemn you to using concrete inheritance and declaring multiple classes even for the most basic data structure imaginable.

Also, an __init__ method that does nothing but assign a few attributes doesn’t have any significant problems, so it is an obvious choice in this case. Given all the problems that I just described with the alternatives, it makes sense that it became the obvious default choice, in most cases.

However, by accepting “define a custom __init__” as the default way to allow users to create your objects, we make a habit of beginning every class with a pile of arbitrary code that gets executed every time it is instantiated.

Wherever there is arbitrary code, there are arbitrary problems.

The Problems

Let’s consider a data structure more complex than one that simply holds a couple of attributes. We will create one that represents a reference to some I/O in the external world: a FileReader.

Of course Python has its own open-file object abstraction, but I will be ignoring that for the purposes of the example.

Let’s assume a world where we have the following functions, in an imaginary fileio module:

  • open(path: str) -> int
  • read(fileno: int, length: int)
  • close(fileno: int)

Our hypothetical fileio.open returns an integer representing a file descriptor1, fileio.read allows us to read length bytes from an open file descriptor, and fileio.close closes that file descriptor, invalidating it for future use.

With the habit that we have built from writing thousands of __init__ methods, we might want to write our FileReader class like this:

1
2
3
4
5
6
7
class FileReader:
    def __init__(self, path: str) -> None:
        self._fd = fileio.open(path)
    def read(self, length: int) -> bytes:
        return fileio.read(self._fd, length)
    def close(self) -> None:
        fileio.close(self._fd)

For our initial use-case, this is fine. Client code creates a FileReader by doing something like FileReader("./config.json"), which always creates a FileReader that maintains its file descriptor int internally as private state. This is as it should be; we don’t want user code to see or mess with _fd, as that might violate FileReader’s invariants. All the necessary work to construct a valid FileReader — i.e. the call to open — is always taken care of for you by FileReader.__init__.

However, additional requirements will creep in, and as they do, FileReader.__init__ becomes increasingly awkward.

Initially we only care about fileio.open, but later, we may have to deal with a library that has its own reasons for managing the call to fileio.open by itself, and wants to give us an int that we use as our _fd, we now have to resort to weird workarounds like:

1
2
3
4
def reader_from_fd(fd: int) -> FileReader:
    fr = object.__new__(FileReader)
    fr._fd = fd
    return fr

Now, all those nice properties that we got from trying to force object construction to give us a valid object are gone. reader_from_fd’s type signature, which takes a plain int, has no way of even suggesting to client code how to ensure that it has passed in the right kind of int.

Testing is much more of a hassle, because we have to patch in our own copy of fileio.open any time we want an instance of a FileReader in a test without doing any real-life file I/O, even if we could (for example) share a single file descriptor among many FileReader s for testing purposes.

All of this also assumes a fileio.open that is synchronous. Although for literal file I/O this is more of a hypothetical concern, there are many types of networked resource which are really only available via an asynchronous (and thus: potentially slow, potentially error-prone) API. If you’ve ever found yourself wanting to type async def __init__(self): ... then you have seen this limitation in practice.

Comprehensively describing all the possible problems with this approach would end up being a book-length treatise on a philosophy of object oriented design, so I will sum up by saying that the cause of all these problems is the same: we are inextricably linking the act of creating a data structure with whatever side-effects are most often associated with that data structure. If they are “often” associated with it, then by definition they are not “always” associated with it, and all the cases where they aren’t associated become unweildy and potentially broken.

Defining an __init__ is an anti-pattern, and we need a replacement for it.

The Solutions

I believe this tripartite assemblage of design techniques will address the problems raised above:

  • using dataclass to define attributes,
  • replacing behavior that previously would have previously been in __init__ with a new classmethod that does the same thing, and
  • using precise types to describe what a valid instance looks like.

Using dataclass attributes to create an __init__ for you

To begin, let’s refactor FileReader into a dataclass. This does get us an __init__ method, but it won’t be one an arbitrary one we define ourselves; it will get the useful constraint enforced on it that it will just assign attributes.

1
2
3
4
5
6
7
@dataclass
class FileReader:
    _fd: int
    def read(self, length: int) -> bytes:
        return fileio.read(self._fd, length)
    def close(self) -> None:
        fileio.close(self._fd)

Except... oops. In fixing the problems that we created with our custom __init__ that calls fileio.open, we have re-introduced several problems that it solved:

  1. We have removed all the convenience of FileReader("path"). Now the user needs to import the low-level fileio.open again, making the most common type of construction both more verbose and less discoverable; if we want users to know how to build a FileReader in a practical scenario, we will have to add something in our documentation to point at a separate module entirely.
  2. There’s no enforcement of the validity of _fd as a file descriptor; it’s just some integer, which the user could easily pass an incorrect instance of, with no error.

In isolation, dataclass by itself can’t solve all our problems, so let’s add in the second technique.

Using classmethod factories to create objects

We don’t want to require any additional imports, or require users to go looking at any other modules — or indeed anything other than FileReader itself — to figure out how to create a FileReader for its intended usage.

Luckily we have a tool that can easily address all of these concerns at once: @classmethod. Let’s define a FileReader.open class method:

1
2
3
4
5
6
7
from typing import Self
@dataclass
class FileReader:
    _fd: int
    @classmethod
    def open(cls, path: str) -> Self:
        return cls(fileio.open(path))

Now, your callers can replace FileReader("path") with FileReader.open("path"), and get all the same benefits.

Additionally, if we needed to await fileio.open(...), and thus we needed its signature to be @classmethod async def open, we are freed from the constraint of __init__ as a special method. There is nothing that would prevent a @classmethod from being async, or indeed, from having any other modification to its return value, such as returning a tuple of related values rather than just the object being constructed.

Using NewType to address object validity

Next, let’s address the slightly trickier issue of enforcing object validity.

Our type signature calls this thing an int, and indeed, that is unfortunately what the lower-level fileio.open gives us, and that’s beyond our control. But for our own purposes, we can be more precise in our definitions, using NewType:

1
2
from typing import NewType
FileDescriptor = NewType("FileDescriptor", int)

There are a few different ways to address the underlying library, but for the sake of brevity and to illustrate that this can be done with zero run-time overhead, let’s just insist to Mypy that we have versions of fileio.open, fileio.read, and fileio.write which actually already take FileDescriptor integers rather than regular ones.

1
2
3
4
from typing import Callable
_open: Callable[[str], FileDescriptor] = fileio.open  # type:ignore[assignment]
_read: Callable[[FileDescriptor, int], bytes] = fileio.read
_close: Callable[[FileDescriptor], None] = fileio.close

We do of course have to slightly adjust FileReader, too, but the changes are very small. Putting it all together, we get:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from typing import Self
@dataclass
class FileReader:
    _fd: FileDescriptor
    @classmethod
    def open(cls, path: str) -> Self:
        return cls(_open(path))
    def read(self, length: int) -> bytes:
        return _read(self._fd, length)
    def close(self) -> None:
        _close(self._fd)

Note that the main technique here is not necessarily using NewType specifically, but rather aligning an instance’s property of “has all attributes set” as closely as possible with an instance’s property of “fully valid instance of its class”; NewType is just a handy tool to enforce any necessary constraints on the places where you need to use a primitive type like int, str or bytes.

In Summary - The New Best Practice

From now on, when you’re defining a new Python class:

  • Make it a dataclass2.
  • Use its default __init__ method3.
  • Add @classmethods to provide your users convenient and discoverable ways to build your objects.
  • Require that all dependencies be satisfied by attributes, so you always start with a valid object.
  • Use typing.NewType to enforce any constraints on primitive data types (like int and str) which might have magical external attributes, like needing to come from a particular library, needing to be random, and so on.

If you define all your classes this way, you will get all the benefits of a custom __init__ method:

  • All consumers of your data structures will receive valid objects, because an object with all its attributes populated correctly is inherently valid.
  • Users of your library will be presented with convenient ways to create your objects that do as much work as is necessary to make them easy to use, and they can discover these just by looking at the methods on your class itself.

Along with some nice new benefits:

  • You will be future-proofed against new requirements for different ways that users may need to construct your object.
  • If there are already multiple ways to instantiate your class, you can now give each of them a meaningful name; no need to have monstrosities like def __init__(self, maybe_a_filename: int | str | None = None):
  • Your test suite can always construct an object by satisfying all its dependencies; no need to monkey-patch anything when you can always call the type and never do any I/O or generate any side effects.

Before dataclasses, it was always a bit weird that such a basic feature of the Python language — giving data to a data structure to make it valid — required overriding a method with 4 underscores in its name. __init__ stuck out like a sore thumb. Other such methods like __add__ or even __repr__ were inherently customizing esoteric attributes of classes.

For many years now, that historical language wart has been resolved. @dataclass, @classmethod, and NewType give you everything you need to build classes which are convenient, idiomatic, flexible, testable, and robust.


Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics like “but what is a ‘class’, really?”.


  1. If you aren’t already familiar, a “file descriptor” is an integer which has meaning only within your program; you tell the operating system to open a file, it says “I have opened file 7 for you”, and then whenever you refer to “7” it is that file, until you close(7)

  2. Or an attrs class, if you’re nasty. 

  3. Unless you have a really good reason to, of course. Backwards compatibility, or compatibility with another library, might be good reasons to do that. Or certain types of data-consistency validation which cannot be expressed within the type system. The most common example of these would be a class that requires consistency between two different fields, such as a “range” object where start must always be less than end. There are always exceptions to these types of rules. Still, it’s pretty much never a good idea to do any I/O in __init__, and nearly all of the remaining stuff that may sometimes be a good idea in edge-cases can be achieved with a __post_init__ rather than writing a literal __init__

Small PINPal Update

I made a new release of PINPal today and that made me want to remind you all about it.

Today on stream, I updated PINPal to fix the memorization algorithm.

If you haven’t heard of PINPal before, it is a vault password memorization tool. For more detail on what that means, you can check it out the README, and why not give it a ⭐ while you’re at it.


As I started writing up an update post I realized that I wanted to contextualize it a bit more, because it’s a tool I really wish were more popular. It solves one of those small security problems that you can mostly ignore, right up until the point where it’s a huge problem and it’s too late to do anything about it.

In brief, PINPal helps you memorize new secure passcodes for things you actually have to remember and can’t simply put into your password manager, like the password to your password manager, your PC user account login, your email account1, or the PIN code to your phone or debit card.

Too often, even if you’re properly using a good password manager for your passwords, you’ll be protecting it with a password optimized for memorability, which is to say, one that isn’t random and thus isn’t secure. But I have also seen folks veer too far in the other direction, trying to make a really secure password that they then forget right after switching to a password manager. Forgetting your vault password can also be a really big deal, making you do password resets across every app you’ve loaded into it so far, so having an opportunity to practice it periodically is important.

PINPal uses spaced repetition to ensure that you remember the codes it generates.

While periodic forced password resets are a bad idea, if (and only if!) you can actually remember the new password, it is a good idea to get rid of old passwords eventually — like, let’s say, when you get a new computer or phone. Doing so reduces the risk that a password stored somewhere on a very old hard drive or darkweb data dump is still floating around out there, forever haunting your current security posture. If you do a reset every 2 years or so, you know you’ve never got more than 2 years of history to worry about.

PINPal is also particularly secure in the way it incrementally generates your password; the computer you install it on only ever stores the entire password in memory when you type it in. It stores even the partial fragments that you are in the process of memorizing using the secure keyring module, avoiding plain-text whenever possible.


I’ve been using PINPal to generate and memorize new codes for a while, just in case2, and the change I made today was because encountered a recurring problem. The problem was, I’d forget a token after it had been hidden, and there was never any going back. The moment that a token was hidden from the user, it was removed from storage, so you could never get a reminder. While I’ve successfully memorized about 10 different passwords with it so far, I’ve had to delete 3 or 4.

So, in the updated algorithm, the visual presentation now hides tokens in the prompt several memorizations before they’re removed. Previously, if the password you were generating was ‘hello world’, you’d see hello world 5 times or so, times, then •••• world; if you ever got it wrong past that point, too bad, start over. Now, you’ll see hello world, then °°°° world, then after you have gotten the prompt right without seeing the token a few times, you’ll see •••• world after the backend has locked it in and it’s properly erased from your computer.

If you get the prompt wrong, breaking your streak reveals the recently-hidden token until you get it right again. I also did a new release on that same livestream, so if this update sounds like it might make the memorization process more appealing, check it out via pip install pinpal today.

Right now this tool is still only extremely for a specific type of nerd — it’s command-line only, and you probably need to hand-customize your shell prompt to invoke it periodically. But I’m working on making it more accessible to a broader audience. It’s open source, of course, so you can feel free to contribute your own code!

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more things like it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor!


  1. Your email account password can be stored in your password manager, of course, but given that email is the root-of-trust reset factor for so many things, being able to remember that password is very helpful in certain situations. 

  2. Funny story: at one point, Apple had an outage which made it briefly appear as if a lot of people needed to reset their iCloud passwords, myself included. Because I’d been testing PINPal a bunch, I actually had several highly secure random passwords already memorized. It was a strange feeling to just respond to the scary password reset prompt with a new, highly secure password and just continue on with my day secure in the knowledge I wouldn't forget it. 

The “Active Enum” Pattern

Enums are objects, why not give them attributes?

Have you ever written some Python code that looks like this?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
from enum import Enum, auto

class SomeNumber(Enum):
    one = auto()
    two = auto()
    three = auto()

def behavior(number: SomeNumber) -> int:
    match number:
        case SomeNumber.one:
            print("one!")
            return 1
        case SomeNumber.two:
            print("two!")
            return 2
        case SomeNumber.three:
            print("three!")
            return 3

That is to say, have you written code that:

  1. defined an enum with several members
  2. associated custom behavior, or custom values, with each member of that enum,
  3. needed one or more match / case statements (or, if you’ve been programming in Python for more than a few weeks, probably a big if/elif/elif/else tree) to do that association?

In this post, I’d like to submit that this is an antipattern; let’s call it the “passive enum” antipattern.

For those of you having a generally positive experience organizing your discrete values with enums, it may seem odd to call this an “antipattern”, so let me first make something clear: the path to a passive enum is going in the correct direction.

Typically - particularly in legacy code that predates Python 3.4 - one begins with a value that is a bare int constant, or maybe a str with some associated values sitting beside in a few global dicts.

Starting from there, collecting all of your values into an enum at all is a great first step. Having an explicit listing of all valid values and verifying against them is great.

But, it is a mistake to stop there. There are problems with passive enums, too:

  1. The behavior can be defined somewhere far away from the data, making it difficult to:
    1. maintain an inventory of everywhere it’s used,
    2. update all the consumers of the data when the list of enum values changes, and
    3. learn about the different usages as a consumer of the API
  2. Logic may be defined procedurally (via if/elif or match) or declaratively (via e.g. a dict whose keys are your enum and whose values are the required associated value).
    1. If it’s defined procedurally, it can be difficult to build tools to interrogate it, because you need to parse the AST of your Python program. So it can be difficult to build interactive tools that look at the associated data without just calling the relevant functions.
    2. If it’s defined declaratively, it can be difficult for existing tools that do know how to interrogate ASTs (mypy, flake8, Pyright, ruff, et. al.) to make meaningful assertions about it. Does your linter know how to check that a dict whose keys should be every value of your enum is complete?

To refactor this, I would propose a further step towards organizing one’s enum-oriented code: the active enum.

An active enum is one which contains all the logic associated with the first-party provider of the enum itself.

You may recognize this as a more generalized restatement of the object-oriented lens on the principle of “separation of concerns”. The responsibilities of a class ought to be implemented as methods on that class, so that you can send messages to that class via method calls, and it’s up to the class internally to implement things. Enums are no different.

More specifically, you might notice it as a riff on the Active Nothing pattern described in this excellent talk by Sandi Metz, and, yeah, it’s the same thing.

The first refactoring that we can make is, thus, to mechanically move the method from an external function living anywhere, to a method on SomeNumber . At least like this, we present an API to consumers externally that shows that SomeNumber has a behavior method that can be invoked.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
from enum import Enum, auto

class SomeNumber(Enum):
    one = auto()
    two = auto()
    three = auto()

    def behavior(self) -> int:
        match self:
            case SomeNumber.one:
                print("one!")
                return 1
            case SomeNumber.two:
                print("two!")
                return 2
            case SomeNumber.three:
                print("three!")
                return 3

However, this still leaves us with a match statement that repeats all the values that we just defined, with no particular guarantee of completeness. To continue the refactoring, what we can do is change the value of the enum itself into a simple dataclass to structurally, by definition, contain all the fields we need:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
from dataclasses import dataclass
from enum import Enum
from typing import Callable

@dataclass(frozen=True)
class NumberValue:
    result: int
    effect: Callable[[], None]

class SomeNumber(Enum):
    one = NumberValue(1, lambda: print("one!"))
    two = NumberValue(2, lambda: print("two!"))
    three = NumberValue(3, lambda: print("three!"))

    def behavior(self) -> int:
        self.value.effect()
        return self.value.result

Here, we give SomeNumber members a value of NumberValue, a dataclass that requires a result: int and an effect: Callable to be constructed. Mypy will properly notice that if x is a SomeNumber, that x will have the type NumberValue and we will get proper type checking on its result (a static value) and effect (some associated behaviors)1.

Note that the implementation of behavior method - still conveniently discoverable for callers, and with its signature unchanged - is now vastly simpler.

But what about...

Lookups?

You may be noticing that I have hand-waved over something important to many enum users, which is to say, by-value lookup. enum.auto will have generated int values for one, two, and three already, and by transforming those into NumberValue instances, I can no longer do SomeNumber(1).

For the simple, string-enum case, one where you might do class MyEnum: value = “value” so that you can do name lookups via MyEnum("value"), there’s a simple solution: use square brackets instead of round ones. In this case, with no matching strings in sight, SomeNumber["one"] still works.

But, if we want to do integer lookups with our dataclass version here, there’s a simple one-liner that will get them back for you; and, moreover, will let you do lookups on whatever attribute you want:

1
by_result = {each.value.result: each for each in SomeNumber}

enum.Flag?

You can do this with Flag more or less unchanged, but in the same way that you can’t expect all your list[T] behaviors to be defined on T, the lack of a 1-to-1 correspondence between Flag instances and their values makes it more complex and out of scope for this pattern specifically.

3rd-party usage?

Sometimes an enum is defined in library L and used in application A, where L provides the data and A provides the behavior. If this is the case, then some amount of version shear is unavoidable; this is a situation where the data and behavior have different vendors, and this means that other means of abstraction are required to keep them in sync. Object-oriented modeling methods are for consolidating the responsibility for maintenance within a single vendor’s scope of responsibility. Once you’re not responsible for the entire model, you can’t do the modeling over all of it, and that is perfectly normal and to be expected.

The goal of the Active Enum pattern is to avoid creating the additional complexity of that shear when it does not serve a purpose, not to ban it entirely.

A Case Study

I was inspired to make this post by a recent refactoring I did from a more obscure and magical2 version of this pattern into the version that I am presenting here, but if I am going to call passive enums an “antipattern” I feel like it behooves me to point at an example outside of my own solo work.

So, for a more realistic example, let’s consider a package that all Python developers will recognize from their day-to-day work, python-hearthstone, the Python library for parsing the data files associated with Blizzard’s popular computerized collectible card game Hearthstone.

As I’m sure you already know, there are a lot of enums in this library, but for one small case study, let’s look a few of the methods in hearthstone.enums.GameType.

GameType has already taken the “step 1” in the direction of an active enum, as I described above: as_bnet is an instancemethod on GameType itself, making it at least easy to see by looking at the class definition what operations it supports. However, in the implementation of that method (among many others) we can see the worst of both worlds:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
class GameType(IntEnum):
    def as_bnet(self, format: FormatType = FormatType.FT_STANDARD):
        if self == GameType.GT_RANKED:
            if format == FormatType.FT_WILD:
                return BnetGameType.BGT_RANKED_WILD
            elif format == FormatType.FT_STANDARD:
                return BnetGameType.BGT_RANKED_STANDARD
            # ...
            else:
                raise ValueError()
        # ...
        return {
            GameType.GT_UNKNOWN: BnetGameType.BGT_UNKNOWN,
            # ...
            GameType.GT_BATTLEGROUNDS_DUO_FRIENDLY: BnetGameType.BGT_BATTLEGROUNDS_DUO_FRIENDLY,
        }[self]

We have procedural code mixed with a data lookup table; raise ValueError mixed together with value returns. Overall, it looks like this might be hard to maintain this going forward, or to see what’s going on without a comprehensive understanding of the game being modeled. Of course for most python programmers that understanding can be assumed, but, still.

If GameType were refactored in the manner above3, you’d be able to look at the member definition for GT_RANKED and see a mapping of FormatType to BnetGameType, or GT_BATTLEGROUNDS_DUO_FRIENDLY to see an unconditional value of BGT_BATTLEGROUNDS_DUO_FRIENDLY. Given that this enum has 40 elements, with several renamed or removed, it seems reasonable to expect that more will be added and removed as the game is developed.

Conclusion

If you have large enums that change over time, consider placing the responsibility for the behavior of the values alongside the values directly, and any logic for processing the values as methods of the enum. This will allow you to quickly validate that you have full coverage of any data that is required among all the different members of the enum, and it will allow API clients a convenient surface to discover the capabilities associated with that enum.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor!


  1. You can get even fancier than this, defining a typing.Protocol as your enum’s value, but it’s best to keep things simple and use a very simple dataclass container if you can. 

  2. derogatory 

  3. I did not submit such a refactoring as a PR before writing this post because I don’t have full context for this library and I do not want to harass the maintainers or burden them with extra changes just to make a rhetorical point. If you do want to try that yourself, please file a bug first and clearly explain how you think it would benefit their project’s maintainability, and make sure that such a PR would be welcome. 

Against Innovation Tokens

The “innovation token” model for selecting technologies is bad, and here’s why.

Updated 2024-07-04: After some discussion, added an epilogue going into more detail about the value of the distinction between the two types of tokens.

In 2015, Dan McKinley laid out a model for software teams selecting technologies. He proposed that each team have a limited supply of “innovation tokens”, and, when selecting a technology, they can choose boring ones for free but “innovative” ones cost a token. This implies that we all know which technologies are innovative, and we assume that they are inherently costly, so we want to restrict their supply.

That model has become popular to the point that it is now part of the vernacular. In many discussions, it is accepted as received wisdom, or even common sense.

In this post I aim to show you that despite being superficially helpful, this model is wrong, and in fact, may be counterproductive. I believe it is an attractive nuisance in computer programming discourse.

In fairness to Mr. McKinley, the model he described in this post is:

  1. nearly a decade old at this point, and
  2. much more nuanced in its description of the problem with “innovation” than the subsequent memetic mutation of the concept.

While I will be referencing McKinley’s post, and I do take some issue with it, I am reacting more strongly to the life of its own that this idea has taken on once it escaped its original context. There are a zillion worse posts rehashing this concept, on blogs and LinkedIn, but I won’t be linking to them because the goal is not to call anybody out.

To some extent I am re-raising McKinley’s own caveats and reinforcing them. So I may be arguing with a strawman, but it’s a strawman I have seen deployed with some regularity over the years.

To reduce it to its core, this strawman is “don’t use new or interesting technology, and if you have to, only use a little bit”.


Within the broader culture of programmers, an “innovation token” has become a shorthand to smear any technology perceived — almost always based on vibes, not data — as risky, and the adoption of novel approaches as pretentious and unserious. Speaking of programmer culture though, I do have to acknowledge there is also a pervasive tendency for us to get distracted by novelty and waste time on puzzles rather than problem-solving, so I understand where the reactionary attitude represented by the concept of an innovation token comes from.

But it is reactionary.

At its worst, it borders on anti-intellectualism. I have heard it used on more than one occasion as a thought-terminating cliche to discard a potentially promising new tool. But before I get into that, let me try to give a sympathetic summary of the idea, because the model is not entirely bad.

It has been popular for a long time because it does work okay as an heuristic.


The real problem that McKinley is describing is operational overhead. When programmers make a technology selection, we are often considering how difficult it will make the programming. Innovative technology selections are, by definition, less mature.

That lack of maturity — particularly in the open source world — often means that the project is in a part of its lifecycle where it is concerned with development affordances more than operational ones. Therefore, the stereotypical innovative project, even one which might legitimately be a big improvement to development velocity, will create more operational overhead. That operational overhead creates a hidden cost for the operations team later on.

This is a point I emphatically agree with. When selecting a technology, you should consider its ease of operation more than its ease of development. If your team is successful, they will be operating and maintaining it far longer than they are initially integrating and deploying it.

Furthermore, some operational overhead is inevitable. You will need to hire people to mitigate it. More popular, more mature projects will have a bigger talent pool to hire from, so your training costs will be lower, and those training costs are part of your operational cost too.

Rationing innovation tokens therefore can work as a reasonable heuristic, or proxy metric, for avoiding a mess of complex operational problems associated with dependencies that are expensive to operate and hard to hire for.


There are some minor issues I want to point out before getting to the overarching one.

  1. “has a lot of operational overhead” is a stereotype of a new technology, not an inherent property. If you want to reject a technology on the basis of being too high-overhead, at least look into its actual overhead a little bit. Sometimes, especially in 2024 as opposed to 2015, the point of a new, shiny piece of tech is to address operational issues that the more boring, older one had.
  2. “hard to learn” is also a stereotype; if “newer” meant “harder” then we would all be using troff rather than Google Docs. Actually ask if the innovativeness is making things harder or easier; don’t assume.
  3. You are going to have to train people on your stack no matter what. If a technology is adding a lot of value, it’s absolutely worth hiring for general ability and making a plan to teach people about it. You are going to have to do this with the core technology of your product anyway.

As I said, though, these are minor issues. The big problem with modeling operational overhead as an “innovation token” is that an even bigger concern than selecting an innovative tool is selecting too many tools.


The impulse to select more tools and make your operational environment more complex can be made worse by trying to avoid innovative tools. The important thing is not “less innovation”, but more consistency. To illustrate this, let’s do a simple thought experiment.

Let’s say you’re going to make a web app. There’s a tool in Haskell that you really like for a critical part of your app’s problem domain. You don’t want to spend more than one innovation token though, and everything in Haskell is inherently innovative, so you write a little service that just does that one part and you write the rest of your app in Ruby, calling into that service whenever you need to use that thing. This will appropriately restrict your “innovation token” expenditure.

Does doing this actually reduce your operational overhead, though?

First, you will have to find a team that likes both Ruby and Haskell and sees no problem using both. If you are not familiar with the cultural proclivities of these languages, suffice it to say that this is unlikely. Hiring for Haskell programmers is hard because there are fewer of them than Ruby programmers, but hiring for polyglot Haskell/Ruby programmers who are happy to do either is going to be really hard.

Since you will need to find different people to write in the different languages, even in the best case scenario, you will have two teams: the Haskell team and the Ruby team. Even if you are incredibly disciplined about inter-service responsibilities, there will be some areas where duplication of code is necessary across those services. Disagreements will arise and every one of these disagreements will be a source of social friction and software defects.

Then, you need to set up separate CI pipelines for each language, separate deployment systems, and of course, separate databases. Right away you are effectively doubling your workload.

In the worse, and unfortunately more likely scenario, there will be enormous infighting between these two teams. Operational incidents will be more difficult to manage because rather than learning the Haskell tools for operational visibility and disseminating that institutional knowledge amongst your team, you will be half-learning the lessons from two separate ecosystems and attempting to integrate them. Every on-call engineer will be frantically trying to learn a language ecosystem they don’t use regularly, or you will double the size of your on-call rotation. The Ruby team may start to resent the Haskell team for getting to exclusively work on the fun parts of the problem while they are doing things that look more like rote grunt work.


A better way to think about the problem of managing operational overhead is, rather than “innovation tokens”, consider “boundary tokens”.

That is to say, rather than evaluating the general sense of weird vibes from your architecture, consider the consistency of that architecture. If you’re using Haskell, use Haskell. You should be all-in on Haskell web frameworks, Haskell ORMs, Haskell OAuth integrations, and so on.1 To cross the boundary out of Haskell, you need to spend a boundary token, and you shouldn’t have many of those.

I submit that the increased operational overhead that you might experience with an all-Haskell tool selection will be dwarfed by the savings that you get by having a team that is aligned with each other, that can communicate easily, and that can share programs with each other without needing to first strategize about a channel for the two pieces of work to establish bidirectional communication. The ability to simply call a function when you need to call it is very powerful, and extremely underrated.

Consistency ought to apply at each layer of the stack; it is perhaps most obvious with programming languages, but it is true of web frameworks, test frameworks, cryptographic libraries, you name it. Make a choice and stick with it, because every deviation from that choice carries a significant cost. Moreover this cost is a hidden cost, in the same way that the operational downsides of an “innovative” tool that hasn’t seen much production use might be hidden.

Discarding a more standard tool in favor of a tool more consistent with your architecture extends even to fairly uncontroversial, ubiquitous tools. For example, one of my favorite architectural patterns is to forego the use of the venerable — and very boring – Cron, the UNIX task-scheduler. Instead of Cron, it can make a lot of sense to have hand-written bespoke code for scheduling tasks within the application. Within the “innovation tokens” model, this is a very silly waste of a token!


Just use Cron! Everybody knows how to use Cron!

Except… does everybody know how to use Cron? Here are some questions to consider, if you’re about to roll out a big dependency on Cron:

  1. How do you write a unit test for a scheduling rule with Cron?
  2. Can you even remember how to write a cron rule that runs at the times you want?
  3. How do you inject secrets and configuration variables into the distinct and somewhat idiosyncratic runtime execution environment of Cron?
  4. How do you know that you did that variable-injection properly until the job actually runs, possibly in the middle of the night?
  5. How do you deploy your monitoring and error-logging frameworks to observe your scripts run under Cron?

Granted, this architectural choice is less controversial than it once was. Cron used to be ambiently available on whatever servers you happened to be running. As container-based deployments have increased in popularity, this sense that Cron is just kinda around has gone away, and if you need to run a container that just runs Cron, much of the jankiness of its deployment is a lot more immediately visible.


There is friction at the boundary between things. That friction is a cost, but sometimes it’s a cost worth paying.

If there’s a really good library in Haskell and a really good library in Ruby and you really do want to use them both, maybe it makes sense to actually have multiple services. As your team gets larger and more mature, the need to bring in more tools, and the ability to handle the associated overhead, will only increase over time. But the place that the cost comes in the most is at the boundary between tools, not in the operational deficiencies of any one particular tool.

Even in a bog-standard web application with the most boring, least innovative tech stack imaginable (PHP, MySQL, HTML, CSS, JavaScript), many of the annoying points of friction are where different, inconsistent technologies make contact. If you are a programmer working on the web yourself, consider your own impression of the level of controversy of these technologies:

Consider that there are far more complex technical tools in terms of required skills to implement them, like computer vision or physics simulation, tools which are also pretty widely used, which consistently generate lower levels of controversy. People do have strong feelings about these things as well, of course, and it’s hard to find things to link to that show “this isn’t controversial”, but, like, search your feelings, you know it to be true.


You can see the benefits of the boundary token approach in programming language design. Many of the most influential and best-loved programming languages had an impact not by bundling together lots of tools, but by making everything into one thing:

  • LISP: everything is a list
  • Smalltalk: everything is an object
  • ML: everything is an algebraic data type
  • Forth: everything is a stack

There is a tremendous power in thinking about everything as a single kind of thing, because then you don’t have to juggle lots of different ideas about different kinds of things; you can just think about your problem.

When people complain about programming languages, they’re often complaining about how many different kinds of thing they have to remember in order to use it.

If you keep your boundary-token budget small, and allow your developers to accomplish as much as possible while staying within a solution space delineated by a single, clean cognitive boundary, I promise you can innovate as much as you want and your operational costs will remain manageable.


Epilogue

In subsequent Mastodon discussion of this post on with Matt Campbell and Meejah, I realized that I may not have made it entirely clear why I feel the distinction between “boundary” and “innovation” tokens is important. I do say above that the “innovation token” model can be a useful heuristic, so why bother with a new, but slightly different heuristic? Especially since most experienced engineers - indeed, McKinley himself - would budget “innovation” quite similarly to “boundaries”, and might even consider the use of more “innovative” Haskell tools in my hypothetical scenario to not even be an expenditure of innovation tokens at all.

To answer that, I need to highlight the purpose of having heuristics like this in the first place. These are vague, nebulous guidelines, not hard and fast rules. I cannot give you a token calculator to plug your technical decisions into. The purpose of either token heuristic is to facilitate discussions among a team.

With a team of skilled and experienced engineers, the distinction is meaningless. Senior and staff engineers (at least, the ones who deserve their level) will intuit the goals behind “innovation tokens” and inherently consider things like operational overhead anyway. In practice, a high-performing, well-aligned team discussing innovation tokens and one discussing boundary tokens will look functionally indistinguishable.

The distinction starts to be important when you have management pressures, nervous executives, inexperienced engineers, a fresh team without existing consensus about core technology choices, and so on. That is to say, most teams that exist in the messy, perpetually in medias res world of the software industry.

If you are just getting started on a project and you have a bunch of competent but disagreeable engineers, the words “innovation” and “boundaries” function very differently.

If you ask, “is this an innovation” about a particular technical tool, you are asking your interlocutor to pull in a bunch of their skills and experience to subjectively evaluate the relative industry-wide, or maybe company-wide, or maybe team-wide2 newness of the thing being discussed. The discussion of whether it counts as boring or innovative is immediately fraught with a ton of subjective, difficult-to-quantify information about costs of hiring, difficulty of learning, and your impression of the feelings of hundreds or thousands of people outside of your team. And, yes, ultimately you do need to have an estimate of all that stuff, but starting your “is it OK to use this” conversation by simultaneously arguing about all those subjective judgments is setting yourself up for failure.

Instead, if you ask “does this introduce a boundary between two different technologies with different conceptual models”, while that is not a perfectly objective question, it is much easier for your team to answer, with much crisper intermediary factual questions. What are the two technologies? What are the models? How much do they differ? You can just hash out the answers to each one within the team directly, rather than needing to sift through the last few years of Stack Overflow developer surveys to determine relative adoption or popularity of technologies in the world at large.

Restricting your supply of either boundary or innovation tokens is a good idea, but achieving unanimity within your team about what your boundaries are is always going to be easier than deciding what your innovations are.


Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics like “how can we make our architecture more consistent”.


  1. I gave a talk about this once, a very long time ago, where Haskell was Python. 

  2. It’s not clear, that’s a big part of the problem. 

DBXS 0.1.0

Today there is a new release of my database access and query organizer library with support for MySQL, PostgreSQL, and asyncio.

New Release

Yesterday I published a new release of DBXS for you all. It’s still ZeroVer, but it has graduated from double-ZeroVer as this is the first nonzero minor version.

More to the point though, the meaning of that version increment this version introduces some critical features that I think most people would need to give it a spin on a hobby project.

What’s New

  • It has support for MySQL and PostgreSQL using native asyncio drivers, which means you don’t need to take a Twisted dependency in production.

  • While Twisted is still used for some of the testing internals, Deferred is no longer exposed anywhere in the public API, either; your tests can happily pretend that they’re doing asyncio, as long as they can run against SQLite.

  • There is a new repository convenience function that automatically wires together multiple accessors and transaction discipline. Have a look at the docstring for a sense of how to use it.

  • Several papercuts, like confusing error messages when messing up query result handling, and lack of proper handling of default arguments in access protocols, are now addressed.

It’s A Good Time To Contribute!

If you’ve been looking for an open source project to try your hand at contributing to, DBXS might be a great opportunity, for a few reasons:

  1. The team is quite small (just me, right now!), so it’s easy to get involved.

  2. It’s quite generally useful, so there’s a potential for an audience, but right now it doesn’t really have any production users; there’s still time to change things without a lot of ceremony.

  3. Unlike many other small starter projects, it’s got a test suite with 100% coverage, so you can contribute with confidence that you’re not breaking anything.

  4. There’s not that much code (a bit over 2 thousand SLOC), so it’s not hard to get your head around.

  5. There are a few obvious next steps for improvement, which I’ve filed as issues if you want to pick one up.

Share and enjoy, and please let me know if you do something fun with it.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics such as “How do I shot SQL?”.