The “Active Enum” Pattern

Enums are objects, why not give them attributes?

Have you ever written some Python code that looks like this?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
from enum import Enum, auto

class SomeNumber(Enum):
    one = auto()
    two = auto()
    three = auto()

def behavior(number: SomeNumber) -> int:
    match number:
        case SomeNumber.one:
            print("one!")
            return 1
        case SomeNumber.two:
            print("two!")
            return 2
        case SomeNumber.three:
            print("three!")
            return 3

That is to say, have you written code that:

  1. defined an enum with several members
  2. associated custom behavior, or custom values, with each member of that enum,
  3. needed one or more match / case statements (or, if you’ve been programming in Python for more than a few weeks, probably a big if/elif/elif/else tree) to do that association?

In this post, I’d like to submit that this is an antipattern; let’s call it the “passive enum” antipattern.

For those of you having a generally positive experience organizing your discrete values with enums, it may seem odd to call this an “antipattern”, so let me first make something clear: the path to a passive enum is going in the correct direction.

Typically - particularly in legacy code that predates Python 3.4 - one begins with a value that is a bare int constant, or maybe a str with some associated values sitting beside in a few global dicts.

Starting from there, collecting all of your values into an enum at all is a great first step. Having an explicit listing of all valid values and verifying against them is great.

But, it is a mistake to stop there. There are problems with passive enums, too:

  1. The behavior can be defined somewhere far away from the data, making it difficult to:
    1. maintain an inventory of everywhere it’s used,
    2. update all the consumers of the data when the list of enum values changes, and
    3. learn about the different usages as a consumer of the API
  2. Logic may be defined procedurally (via if/elif or match) or declaratively (via e.g. a dict whose keys are your enum and whose values are the required associated value).
    1. If it’s defined procedurally, it can be difficult to build tools to interrogate it, because you need to parse the AST of your Python program. So it can be difficult to build interactive tools that look at the associated data without just calling the relevant functions.
    2. If it’s defined declaratively, it can be difficult for existing tools that do know how to interrogate ASTs (mypy, flake8, Pyright, ruff, et. al.) to make meaningful assertions about it. Does your linter know how to check that a dict whose keys should be every value of your enum is complete?

To refactor this, I would propose a further step towards organizing one’s enum-oriented code: the active enum.

An active enum is one which contains all the logic associated with the first-party provider of the enum itself.

You may recognize this as a more generalized restatement of the object-oriented lens on the principle of “separation of concerns”. The responsibilities of a class ought to be implemented as methods on that class, so that you can send messages to that class via method calls, and it’s up to the class internally to implement things. Enums are no different.

More specifically, you might notice it as a riff on the Active Nothing pattern described in this excellent talk by Sandi Metz, and, yeah, it’s the same thing.

The first refactoring that we can make is, thus, to mechanically move the method from an external function living anywhere, to a method on SomeNumber . At least like this, we present an API to consumers externally that shows that SomeNumber has a behavior method that can be invoked.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
from enum import Enum, auto

class SomeNumber(Enum):
    one = auto()
    two = auto()
    three = auto()

    def behavior(self) -> int:
        match self:
            case SomeNumber.one:
                print("one!")
                return 1
            case SomeNumber.two:
                print("two!")
                return 2
            case SomeNumber.three:
                print("three!")
                return 3

However, this still leaves us with a match statement that repeats all the values that we just defined, with no particular guarantee of completeness. To continue the refactoring, what we can do is change the value of the enum itself into a simple dataclass to structurally, by definition, contain all the fields we need:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
from dataclasses import dataclass
from enum import Enum
from typing import Callable

@dataclass(frozen=True)
class NumberValue:
    result: int
    effect: Callable[[], None]

class SomeNumber(Enum):
    one = NumberValue(1, lambda: print("one!"))
    two = NumberValue(2, lambda: print("two!"))
    three = NumberValue(3, lambda: print("three!"))

    def behavior(self) -> int:
        self.value.effect()
        return self.value.result

Here, we give SomeNumber members a value of NumberValue, a dataclass that requires a result: int and an effect: Callable to be constructed. Mypy will properly notice that if x is a SomeNumber, that x will have the type NumberValue and we will get proper type checking on its result (a static value) and effect (some associated behaviors)1.

Note that the implementation of behavior method - still conveniently discoverable for callers, and with its signature unchanged - is now vastly simpler.

But what about...

Lookups?

You may be noticing that I have hand-waved over something important to many enum users, which is to say, by-value lookup. enum.auto will have generated int values for one, two, and three already, and by transforming those into NumberValue instances, I can no longer do SomeNumber(1).

For the simple, string-enum case, one where you might do class MyEnum: value = “value” so that you can do name lookups via MyEnum("value"), there’s a simple solution: use square brackets instead of round ones. In this case, with no matching strings in sight, SomeNumber["one"] still works.

But, if we want to do integer lookups with our dataclass version here, there’s a simple one-liner that will get them back for you; and, moreover, will let you do lookups on whatever attribute you want:

1
by_result = {each.value.result: each for each in SomeNumber}

enum.Flag?

You can do this with Flag more or less unchanged, but in the same way that you can’t expect all your list[T] behaviors to be defined on T, the lack of a 1-to-1 correspondence between Flag instances and their values makes it more complex and out of scope for this pattern specifically.

3rd-party usage?

Sometimes an enum is defined in library L and used in application A, where L provides the data and A provides the behavior. If this is the case, then some amount of version shear is unavoidable; this is a situation where the data and behavior have different vendors, and this means that other means of abstraction are required to keep them in sync. Object-oriented modeling methods are for consolidating the responsibility for maintenance within a single vendor’s scope of responsibility. Once you’re not responsible for the entire model, you can’t do the modeling over all of it, and that is perfectly normal and to be expected.

The goal of the Active Enum pattern is to avoid creating the additional complexity of that shear when it does not serve a purpose, not to ban it entirely.

A Case Study

I was inspired to make this post by a recent refactoring I did from a more obscure and magical2 version of this pattern into the version that I am presenting here, but if I am going to call passive enums an “antipattern” I feel like it behooves me to point at an example outside of my own solo work.

So, for a more realistic example, let’s consider a package that all Python developers will recognize from their day-to-day work, python-hearthstone, the Python library for parsing the data files associated with Blizzard’s popular computerized collectible card game Hearthstone.

As I’m sure you already know, there are a lot of enums in this library, but for one small case study, let’s look a few of the methods in hearthstone.enums.GameType.

GameType has already taken the “step 1” in the direction of an active enum, as I described above: as_bnet is an instancemethod on GameType itself, making it at least easy to see by looking at the class definition what operations it supports. However, in the implementation of that method (among many others) we can see the worst of both worlds:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
class GameType(IntEnum):
    def as_bnet(self, format: FormatType = FormatType.FT_STANDARD):
        if self == GameType.GT_RANKED:
            if format == FormatType.FT_WILD:
                return BnetGameType.BGT_RANKED_WILD
            elif format == FormatType.FT_STANDARD:
                return BnetGameType.BGT_RANKED_STANDARD
            # ...
            else:
                raise ValueError()
        # ...
        return {
            GameType.GT_UNKNOWN: BnetGameType.BGT_UNKNOWN,
            # ...
            GameType.GT_BATTLEGROUNDS_DUO_FRIENDLY: BnetGameType.BGT_BATTLEGROUNDS_DUO_FRIENDLY,
        }[self]

We have procedural code mixed with a data lookup table; raise ValueError mixed together with value returns. Overall, it looks like this might be hard to maintain this going forward, or to see what’s going on without a comprehensive understanding of the game being modeled. Of course for most python programmers that understanding can be assumed, but, still.

If GameType were refactored in the manner above3, you’d be able to look at the member definition for GT_RANKED and see a mapping of FormatType to BnetGameType, or GT_BATTLEGROUNDS_DUO_FRIENDLY to see an unconditional value of BGT_BATTLEGROUNDS_DUO_FRIENDLY. Given that this enum has 40 elements, with several renamed or removed, it seems reasonable to expect that more will be added and removed as the game is developed.

Conclusion

If you have large enums that change over time, consider placing the responsibility for the behavior of the values alongside the values directly, and any logic for processing the values as methods of the enum. This will allow you to quickly validate that you have full coverage of any data that is required among all the different members of the enum, and it will allow API clients a convenient surface to discover the capabilities associated with that enum.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor!


  1. You can get even fancier than this, defining a typing.Protocol as your enum’s value, but it’s best to keep things simple and use a very simple dataclass container if you can. 

  2. derogatory 

  3. I did not submit such a refactoring as a PR before writing this post because I don’t have full context for this library and I do not want to harass the maintainers or burden them with extra changes just to make a rhetorical point. If you do want to try that yourself, please file a bug first and clearly explain how you think it would benefit their project’s maintainability, and make sure that such a PR would be welcome. 

Against Innovation Tokens

The “innovation token” model for selecting technologies is bad, and here’s why.

Updated 2024-07-04: After some discussion, added an epilogue going into more detail about the value of the distinction between the two types of tokens.

In 2015, Dan McKinley laid out a model for software teams selecting technologies. He proposed that each team have a limited supply of “innovation tokens”, and, when selecting a technology, they can choose boring ones for free but “innovative” ones cost a token. This implies that we all know which technologies are innovative, and we assume that they are inherently costly, so we want to restrict their supply.

That model has become popular to the point that it is now part of the vernacular. In many discussions, it is accepted as received wisdom, or even common sense.

In this post I aim to show you that despite being superficially helpful, this model is wrong, and in fact, may be counterproductive. I believe it is an attractive nuisance in computer programming discourse.

In fairness to Mr. McKinley, the model he described in this post is:

  1. nearly a decade old at this point, and
  2. much more nuanced in its description of the problem with “innovation” than the subsequent memetic mutation of the concept.

While I will be referencing McKinley’s post, and I do take some issue with it, I am reacting more strongly to the life of its own that this idea has taken on once it escaped its original context. There are a zillion worse posts rehashing this concept, on blogs and LinkedIn, but I won’t be linking to them because the goal is not to call anybody out.

To some extent I am re-raising McKinley’s own caveats and reinforcing them. So I may be arguing with a strawman, but it’s a strawman I have seen deployed with some regularity over the years.

To reduce it to its core, this strawman is “don’t use new or interesting technology, and if you have to, only use a little bit”.


Within the broader culture of programmers, an “innovation token” has become a shorthand to smear any technology perceived — almost always based on vibes, not data — as risky, and the adoption of novel approaches as pretentious and unserious. Speaking of programmer culture though, I do have to acknowledge there is also a pervasive tendency for us to get distracted by novelty and waste time on puzzles rather than problem-solving, so I understand where the reactionary attitude represented by the concept of an innovation token comes from.

But it is reactionary.

At its worst, it borders on anti-intellectualism. I have heard it used on more than one occasion as a thought-terminating cliche to discard a potentially promising new tool. But before I get into that, let me try to give a sympathetic summary of the idea, because the model is not entirely bad.

It has been popular for a long time because it does work okay as an heuristic.


The real problem that McKinley is describing is operational overhead. When programmers make a technology selection, we are often considering how difficult it will make the programming. Innovative technology selections are, by definition, less mature.

That lack of maturity — particularly in the open source world — often means that the project is in a part of its lifecycle where it is concerned with development affordances more than operational ones. Therefore, the stereotypical innovative project, even one which might legitimately be a big improvement to development velocity, will create more operational overhead. That operational overhead creates a hidden cost for the operations team later on.

This is a point I emphatically agree with. When selecting a technology, you should consider its ease of operation more than its ease of development. If your team is successful, they will be operating and maintaining it far longer than they are initially integrating and deploying it.

Furthermore, some operational overhead is inevitable. You will need to hire people to mitigate it. More popular, more mature projects will have a bigger talent pool to hire from, so your training costs will be lower, and those training costs are part of your operational cost too.

Rationing innovation tokens therefore can work as a reasonable heuristic, or proxy metric, for avoiding a mess of complex operational problems associated with dependencies that are expensive to operate and hard to hire for.


There are some minor issues I want to point out before getting to the overarching one.

  1. “has a lot of operational overhead” is a stereotype of a new technology, not an inherent property. If you want to reject a technology on the basis of being too high-overhead, at least look into its actual overhead a little bit. Sometimes, especially in 2024 as opposed to 2015, the point of a new, shiny piece of tech is to address operational issues that the more boring, older one had.
  2. “hard to learn” is also a stereotype; if “newer” meant “harder” then we would all be using troff rather than Google Docs. Actually ask if the innovativeness is making things harder or easier; don’t assume.
  3. You are going to have to train people on your stack no matter what. If a technology is adding a lot of value, it’s absolutely worth hiring for general ability and making a plan to teach people about it. You are going to have to do this with the core technology of your product anyway.

As I said, though, these are minor issues. The big problem with modeling operational overhead as an “innovation token” is that an even bigger concern than selecting an innovative tool is selecting too many tools.


The impulse to select more tools and make your operational environment more complex can be made worse by trying to avoid innovative tools. The important thing is not “less innovation”, but more consistency. To illustrate this, let’s do a simple thought experiment.

Let’s say you’re going to make a web app. There’s a tool in Haskell that you really like for a critical part of your app’s problem domain. You don’t want to spend more than one innovation token though, and everything in Haskell is inherently innovative, so you write a little service that just does that one part and you write the rest of your app in Ruby, calling into that service whenever you need to use that thing. This will appropriately restrict your “innovation token” expenditure.

Does doing this actually reduce your operational overhead, though?

First, you will have to find a team that likes both Ruby and Haskell and sees no problem using both. If you are not familiar with the cultural proclivities of these languages, suffice it to say that this is unlikely. Hiring for Haskell programmers is hard because there are fewer of them than Ruby programmers, but hiring for polyglot Haskell/Ruby programmers who are happy to do either is going to be really hard.

Since you will need to find different people to write in the different languages, even in the best case scenario, you will have two teams: the Haskell team and the Ruby team. Even if you are incredibly disciplined about inter-service responsibilities, there will be some areas where duplication of code is necessary across those services. Disagreements will arise and every one of these disagreements will be a source of social friction and software defects.

Then, you need to set up separate CI pipelines for each language, separate deployment systems, and of course, separate databases. Right away you are effectively doubling your workload.

In the worse, and unfortunately more likely scenario, there will be enormous infighting between these two teams. Operational incidents will be more difficult to manage because rather than learning the Haskell tools for operational visibility and disseminating that institutional knowledge amongst your team, you will be half-learning the lessons from two separate ecosystems and attempting to integrate them. Every on-call engineer will be frantically trying to learn a language ecosystem they don’t use regularly, or you will double the size of your on-call rotation. The Ruby team may start to resent the Haskell team for getting to exclusively work on the fun parts of the problem while they are doing things that look more like rote grunt work.


A better way to think about the problem of managing operational overhead is, rather than “innovation tokens”, consider “boundary tokens”.

That is to say, rather than evaluating the general sense of weird vibes from your architecture, consider the consistency of that architecture. If you’re using Haskell, use Haskell. You should be all-in on Haskell web frameworks, Haskell ORMs, Haskell OAuth integrations, and so on.1 To cross the boundary out of Haskell, you need to spend a boundary token, and you shouldn’t have many of those.

I submit that the increased operational overhead that you might experience with an all-Haskell tool selection will be dwarfed by the savings that you get by having a team that is aligned with each other, that can communicate easily, and that can share programs with each other without needing to first strategize about a channel for the two pieces of work to establish bidirectional communication. The ability to simply call a function when you need to call it is very powerful, and extremely underrated.

Consistency ought to apply at each layer of the stack; it is perhaps most obvious with programming languages, but it is true of web frameworks, test frameworks, cryptographic libraries, you name it. Make a choice and stick with it, because every deviation from that choice carries a significant cost. Moreover this cost is a hidden cost, in the same way that the operational downsides of an “innovative” tool that hasn’t seen much production use might be hidden.

Discarding a more standard tool in favor of a tool more consistent with your architecture extends even to fairly uncontroversial, ubiquitous tools. For example, one of my favorite architectural patterns is to forego the use of the venerable — and very boring – Cron, the UNIX task-scheduler. Instead of Cron, it can make a lot of sense to have hand-written bespoke code for scheduling tasks within the application. Within the “innovation tokens” model, this is a very silly waste of a token!


Just use Cron! Everybody knows how to use Cron!

Except… does everybody know how to use Cron? Here are some questions to consider, if you’re about to roll out a big dependency on Cron:

  1. How do you write a unit test for a scheduling rule with Cron?
  2. Can you even remember how to write a cron rule that runs at the times you want?
  3. How do you inject secrets and configuration variables into the distinct and somewhat idiosyncratic runtime execution environment of Cron?
  4. How do you know that you did that variable-injection properly until the job actually runs, possibly in the middle of the night?
  5. How do you deploy your monitoring and error-logging frameworks to observe your scripts run under Cron?

Granted, this architectural choice is less controversial than it once was. Cron used to be ambiently available on whatever servers you happened to be running. As container-based deployments have increased in popularity, this sense that Cron is just kinda around has gone away, and if you need to run a container that just runs Cron, much of the jankiness of its deployment is a lot more immediately visible.


There is friction at the boundary between things. That friction is a cost, but sometimes it’s a cost worth paying.

If there’s a really good library in Haskell and a really good library in Ruby and you really do want to use them both, maybe it makes sense to actually have multiple services. As your team gets larger and more mature, the need to bring in more tools, and the ability to handle the associated overhead, will only increase over time. But the place that the cost comes in the most is at the boundary between tools, not in the operational deficiencies of any one particular tool.

Even in a bog-standard web application with the most boring, least innovative tech stack imaginable (PHP, MySQL, HTML, CSS, JavaScript), many of the annoying points of friction are where different, inconsistent technologies make contact. If you are a programmer working on the web yourself, consider your own impression of the level of controversy of these technologies:

Consider that there are far more complex technical tools in terms of required skills to implement them, like computer vision or physics simulation, tools which are also pretty widely used, which consistently generate lower levels of controversy. People do have strong feelings about these things as well, of course, and it’s hard to find things to link to that show “this isn’t controversial”, but, like, search your feelings, you know it to be true.


You can see the benefits of the boundary token approach in programming language design. Many of the most influential and best-loved programming languages had an impact not by bundling together lots of tools, but by making everything into one thing:

  • LISP: everything is a list
  • Smalltalk: everything is an object
  • ML: everything is an algebraic data type
  • Forth: everything is a stack

There is a tremendous power in thinking about everything as a single kind of thing, because then you don’t have to juggle lots of different ideas about different kinds of things; you can just think about your problem.

When people complain about programming languages, they’re often complaining about how many different kinds of thing they have to remember in order to use it.

If you keep your boundary-token budget small, and allow your developers to accomplish as much as possible while staying within a solution space delineated by a single, clean cognitive boundary, I promise you can innovate as much as you want and your operational costs will remain manageable.


Epilogue

In subsequent Mastodon discussion of this post on with Matt Campbell and Meejah, I realized that I may not have made it entirely clear why I feel the distinction between “boundary” and “innovation” tokens is important. I do say above that the “innovation token” model can be a useful heuristic, so why bother with a new, but slightly different heuristic? Especially since most experienced engineers - indeed, McKinley himself - would budget “innovation” quite similarly to “boundaries”, and might even consider the use of more “innovative” Haskell tools in my hypothetical scenario to not even be an expenditure of innovation tokens at all.

To answer that, I need to highlight the purpose of having heuristics like this in the first place. These are vague, nebulous guidelines, not hard and fast rules. I cannot give you a token calculator to plug your technical decisions into. The purpose of either token heuristic is to facilitate discussions among a team.

With a team of skilled and experienced engineers, the distinction is meaningless. Senior and staff engineers (at least, the ones who deserve their level) will intuit the goals behind “innovation tokens” and inherently consider things like operational overhead anyway. In practice, a high-performing, well-aligned team discussing innovation tokens and one discussing boundary tokens will look functionally indistinguishable.

The distinction starts to be important when you have management pressures, nervous executives, inexperienced engineers, a fresh team without existing consensus about core technology choices, and so on. That is to say, most teams that exist in the messy, perpetually in medias res world of the software industry.

If you are just getting started on a project and you have a bunch of competent but disagreeable engineers, the words “innovation” and “boundaries” function very differently.

If you ask, “is this an innovation” about a particular technical tool, you are asking your interlocutor to pull in a bunch of their skills and experience to subjectively evaluate the relative industry-wide, or maybe company-wide, or maybe team-wide2 newness of the thing being discussed. The discussion of whether it counts as boring or innovative is immediately fraught with a ton of subjective, difficult-to-quantify information about costs of hiring, difficulty of learning, and your impression of the feelings of hundreds or thousands of people outside of your team. And, yes, ultimately you do need to have an estimate of all that stuff, but starting your “is it OK to use this” conversation by simultaneously arguing about all those subjective judgments is setting yourself up for failure.

Instead, if you ask “does this introduce a boundary between two different technologies with different conceptual models”, while that is not a perfectly objective question, it is much easier for your team to answer, with much crisper intermediary factual questions. What are the two technologies? What are the models? How much do they differ? You can just hash out the answers to each one within the team directly, rather than needing to sift through the last few years of Stack Overflow developer surveys to determine relative adoption or popularity of technologies in the world at large.

Restricting your supply of either boundary or innovation tokens is a good idea, but achieving unanimity within your team about what your boundaries are is always going to be easier than deciding what your innovations are.


Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics like “how can we make our architecture more consistent”.


  1. I gave a talk about this once, a very long time ago, where Haskell was Python. 

  2. It’s not clear, that’s a big part of the problem. 

DBXS 0.1.0

Today there is a new release of my database access and query organizer library with support for MySQL, PostgreSQL, and asyncio.

New Release

Yesterday I published a new release of DBXS for you all. It’s still ZeroVer, but it has graduated from double-ZeroVer as this is the first nonzero minor version.

More to the point though, the meaning of that version increment this version introduces some critical features that I think most people would need to give it a spin on a hobby project.

What’s New

  • It has support for MySQL and PostgreSQL using native asyncio drivers, which means you don’t need to take a Twisted dependency in production.

  • While Twisted is still used for some of the testing internals, Deferred is no longer exposed anywhere in the public API, either; your tests can happily pretend that they’re doing asyncio, as long as they can run against SQLite.

  • There is a new repository convenience function that automatically wires together multiple accessors and transaction discipline. Have a look at the docstring for a sense of how to use it.

  • Several papercuts, like confusing error messages when messing up query result handling, and lack of proper handling of default arguments in access protocols, are now addressed.

It’s A Good Time To Contribute!

If you’ve been looking for an open source project to try your hand at contributing to, DBXS might be a great opportunity, for a few reasons:

  1. The team is quite small (just me, right now!), so it’s easy to get involved.

  2. It’s quite generally useful, so there’s a potential for an audience, but right now it doesn’t really have any production users; there’s still time to change things without a lot of ceremony.

  3. Unlike many other small starter projects, it’s got a test suite with 100% coverage, so you can contribute with confidence that you’re not breaking anything.

  4. There’s not that much code (a bit over 2 thousand SLOC), so it’s not hard to get your head around.

  5. There are a few obvious next steps for improvement, which I’ve filed as issues if you want to pick one up.

Share and enjoy, and please let me know if you do something fun with it.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics such as “How do I shot SQL?”.

Annotated At Runtime

PEP 593 is a bit vague on how you’re supposed to actually consume arguments to Annotated; here is my proposal.

PEP 0593 added the ability to add arbitrary user-defined metadata to type annotations in Python.

At type-check time, such annotations are… inert. They don’t do anything. Annotated[int, X] just means int to the type-checker, regardless of the value of X. So the entire purpose of Annotated is to provide a run-time API to consume metadata, which integrates with the type checker syntactically, but does not otherwise disturb it.

Yet, the documentation for this central purpose seems, while not exactly absent, oddly incomplete.

The PEP itself simply says:

A tool or library encountering an Annotated type can scan through the annotations to determine if they are of interest (e.g., using isinstance()).

But it’s not clear where “the annotations” are, given that the PEP’s entire “consuming annotations” section does not even mention the __metadata__ attribute where the annotation’s arguments go, which was only even added to CPython’s documentation. Its list of examples just show the repr() of the relevant type.

There’s also a bit of an open question of what, exactly, we are supposed to isinstance()-ing here. If we want to find arguments to Annotated, presumably we need to be able to detect if an annotation is an Annotated. But isinstance(Annotated[int, "hello"], Annotated) is both False at runtime, and also a type-checking error, that looks like this:

1
Argument 2 to "isinstance" has incompatible type "<typing special form>"; expected "_ClassInfo"

The actual type of these objects, typing._AnnotatedAlias, does not seem to have a publicly available or documented alias, so that seems like the wrong route too.

Now, it certainly works to escape-hatch your way out of all of this with an Any, build some version-specific special-case hacks to dig around in the relevant namespaces, access __metadata__ and call it a day. But this solution is … unsatisfying.

What are you looking for?

Upon encountering these quirks, it is understandable to want to simply ask the question “is this annotation that I’m looking at an Annotated?” and to be frustrated that it seems so obscure to straightforwardly get an answer to that question without disabling all type-checking in your meta-programming code.

However, I think that this is a slight misframing of the problem. Code that is inspecting parameters for an annotation is going to do something with that annotation, which means that it must necessarily be looking for a specific set of annotations. Therefore the thing we want to pass to isinstance is not some obscure part of the annotations’ internals, but the actual interesting annotation type from your framework or application.

When consuming an Annotated parameter, there are 3 things you probably want to know:

  1. What was the parameter itself? (type: The type you passed in.)
  2. What was the name of the annotated object (i.e.: the parameter name, the attribute name) being passed the parameter? (type: str)
  3. What was the actual type being annotated? (type: type)

And the things that we have are the type of the Annotated we’re querying for, and the object with annotations we are interrogating. So that gives us this function signature:

1
2
3
4
5
def annotated_by(
    annotated: object,
    kind: type[T],
) -> Iterable[tuple[str, T, type]]:
    ...

To extract this information, all we need are get_args and get_type_hints; no need for __metadata__ or get_origin or any other metaprogramming. Here’s a recipe:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
def annotated_by(
    annotated: object,
    kind: type[T],
) -> Iterable[tuple[str, T, type]]:
    for k, v in get_type_hints(annotated, include_extras=True).items():
        all_args = get_args(v)
        if not all_args:
            continue
        actual, *rest = all_args
        for arg in rest:
            if isinstance(arg, kind):
                yield k, arg, actual

It might seem a little odd to be blindly assuming that get_args(...)[0] will always be the relevant type, when that is not true of unions or generics. Note, however, that we are only yielding results when we have found the instance type in the argument list; our arbitrary user-defined instance isn’t valid as a type annotation argument in any other context. It can’t be part of a Union or a Generic, so we can rely on it to be an Annotated argument, and from there, we can make that assumption about the format of get_args(...).

This can give us back the annotations that we’re looking for in a handy format that’s easy to consume. Here’s a quick example of how you might use it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
@dataclass
class AnAnnotation:
    name: str

def a_function(
    a: str,
    b: Annotated[int, AnAnnotation("b")],
    c: Annotated[float, AnAnnotation("c")],
) -> None:
    ...

print(list(annotated_by(a_function, AnAnnotation)))

# [('b', AnAnnotation(name='b'), <class 'int'>),
#  ('c', AnAnnotation(name='c'), <class 'float'>)]

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics like “how do I do Python metaprogramming, but, like, not super janky”.

Safer, Not Later

How “Move Fast and Break Things” ruined the world by escaping the context that it was intended for.

Facebook — and by extension, most of Silicon Valley — rightly gets a lot of shit for its old motto, “Move Fast and Break Things”.

As a general principle for living your life, it is obviously terrible advice, and it leads to a lot of the horrific outcomes of Facebook’s business.

I don’t want to be an apologist for Facebook. I also do not want to excuse the worldview that leads to those kinds of outcomes. However, I do want to try to help laypeople understand how software engineers—particularly those situated at the point in history where this motto became popular—actually meant by it. I would like more people in the general public to understand why, to engineers, it was supposed to mean roughly the same thing as Facebook’s newer, goofier-sounding “Move fast with stable infrastructure”.

Move Slow

In the bad old days, circa 2005, two worlds within the software industry were colliding.

The old world was the world of integrated hardware/software companies, like IBM and Apple, and shrink-wrapped software companies like Microsoft and WordPerfect. The new world was software-as-a-service companies like Google, and, yes, Facebook.

In the old world, you delivered software in a physical, shrink-wrapped box, on a yearly release cycle. If you were really aggressive you might ship updates as often as quarterly, but faster than that and your physical shipping infrastructure would not be able to keep pace with new versions. As such, development could proceed in long phases based on those schedules.

In practice what this meant was that in the old world, when development began on a new version, programmers would go absolutely wild adding incredibly buggy, experimental code to see what sorts of things might be possible in a new version, then slowly transition to less coding and more testing, eventually settling into a testing and bug-fixing mode in the last few months before the release.

This is where the idea of “alpha” (development testing) and “beta” (user testing) versions came from. Software in that initial surge of unstable development was extremely likely to malfunction or even crash. Everyone understood that. How could it be otherwise? In an alpha test, the engineers hadn’t even started bug-fixing yet!

In the new world, the idea of a 6-month-long “beta test” was incoherent. If your software was a website, you shipped it to users every time they hit “refresh”. The software was running 24/7, on hardware that you controlled. You could be adding features at every minute of every day. And, now that this was possible, you needed to be adding those features, or your users would get bored and leave for your competitors, who would do it.

But this came along with a new attitude towards quality and reliability. If you needed to ship a feature within 24 hours, you couldn’t write a buggy version that crashed all the time, see how your carefully-selected group of users used it, collect crash reports, fix all the bugs, have a feature-freeze and do nothing but fix bugs for a few months. You needed to be able to ship a stable version of your software on Monday and then have another stable version on Tuesday.

To support this novel sort of development workflow, the industry developed new technologies. I am tempted to tell you about them all. Unit testing, continuous integration servers, error telemetry, system monitoring dashboards, feature flags... this is where a lot of my personal expertise lies. I was very much on the front lines of the “new world” in this conflict, trying to move companies to shorter and shorter development cycles, and move away from the legacy worldview of Big Release Day engineering.

Old habits die hard, though. Most engineers at this point were trained in a world where they had months of continuous quality assurance processes after writing their first rough draft. Such engineers feel understandably nervous about being required to ship their probably-buggy code to paying customers every day. So they would try to slow things down.

Of course, when one is deploying all the time, all other things being equal, it’s easy to ship a show-stopping bug to customers. Organizations would do this, and they’d get burned. And when they’d get burned, they would introduce Processes to slow things down. Some of these would look like:

  1. Let’s keep a special version of our code set aside for testing, and then we’ll test that for a few weeks before sending it to users.
  2. The heads of every department need to sign-off on every deployed version, so everyone needs to spend a day writing up an explanation of their changes.
  3. QA should sign off too, so let’s have an extensive sign-off process where each individual tester does a fills out a sign-off form.

Then there’s my favorite version of this pattern, where management decides that deploys are inherently dangerous, and everyone should probably just stop doing them. It typically proceeds in stages:

  1. Let’s have a deploy freeze, and not deploy on Fridays; don’t want to mess up the weekend debugging an outage.
  2. Actually, let’s extend that freeze for all of December, we don’t want to mess up the holiday shopping season.
  3. Actually why not have the freeze extend into the end of November? Don’t want to mess with Thanksgiving and the Black Friday weekend.
  4. Some of our customers are in India, and Diwali’s also a big deal. Why not extend the freeze from the end of October?
  5. But, come to think of it, we do a fair amount of seasonal sales for Halloween too. How about no deployments from October 10 onward?
  6. You know what, sometimes people like to use our shop for Valentine’s day too. Let’s just never deploy again.

This same anti-pattern can repeat itself with an endlessly proliferating list of “environments”, whose main role ends up being to ensure that no code ever makes it to actual users.

… and break things anyway

As you may have begun to suspect, there are a few problems with this style of software development.

Even back in the bad old days of the 90s when you had to ship disks in boxes, this methodology contained within itself the seeds of its own destruction. As Joel Spolsky memorably put it, Microsoft discovered that this idea that you could introduce a ton of bugs and then just fix them later came along with some massive disadvantages:

The very first version of Microsoft Word for Windows was considered a “death march” project. It took forever. It kept slipping. The whole team was working ridiculous hours, the project was delayed again, and again, and again, and the stress was incredible. [...] The story goes that one programmer, who had to write the code to calculate the height of a line of text, simply wrote “return 12;” and waited for the bug report to come in [...]. The schedule was merely a checklist of features waiting to be turned into bugs. In the post-mortem, this was referred to as “infinite defects methodology”.

Which lead them to what is perhaps the most ironclad law of software engineering:

In general, the longer you wait before fixing a bug, the costlier (in time and money) it is to fix.

A corollary to this is that the longer you wait to discover a bug, the costlier it is to fix.

Some bugs can be found by code review. So you should do code review. Some bugs can be found by automated tests. So you should do automated testing. Some bugs will be found by monitoring dashboards, so you should have monitoring dashboards.

So why not move fast?

But here is where Facebook’s old motto comes in to play. All of those principles above are true, but here are two more things that are true:

  1. No matter how much code review, automated testing, and monitoring you have some bugs can only be found by users interacting with your software.
  2. No bugs can be found merely by slowing down and putting the deploy off another day.

Once you have made the process of releasing software to users sufficiently safe that the potential damage of any given deployment can be reliably limited, it is always best to release your changes to users as quickly as possible.

More importantly, as an engineer, you will naturally have an inherent fear of breaking things. If you make no changes, you cannot be blamed for whatever goes wrong. Particularly if you grew up in the Old World, there is an ever-present temptation to slow down, to avoid shipping, to hold back your changes, just in case.

You will want to move slow, to avoid breaking things. Better to do nothing, to be useless, than to do harm.

For all its faults as an organization, Facebook did, and does, have some excellent infrastructure to avoid breaking their software systems in response to features being deployed to production. In that sense, they’d already done the work to avoid the “harm” of an individual engineer’s changes. If future work needed to be performed to increase safety, then that work should be done by the infrastructure team to make things safer, not by every other engineer slowing down.

The problem is that slowing down is not actually value neutral. To quote myself here:

If you can’t ship a feature, you can’t fix a bug.

When you slow down just for the sake of slowing down, you create more problems.

The first problem that you create is smashing together far too many changes at once.

You’ve got a development team. Every engineer on that team is adding features at some rate. You want them to be doing that work. Necessarily, they’re all integrating them into the codebase to be deployed whenever the next deployment happens.

If a problem occurs with one of those changes, and you want to quickly know which change caused that problem, ideally you want to compare two versions of the software with the smallest number of changes possible between them. Ideally, every individual change would be released on its own, so you can see differences in behavior between versions which contain one change each, not a gigantic avalanche of changes where any one of hundred different features might be the culprit.

If you slow down for the sake of slowing down, you also create a process that cannot respond to failures of the existing code.

I’ve been writing thus far as if a system in a steady state is inherently fine, and each change carries the possibility of benefit but also the risk of failure. This is not always true. Changes don’t just occur in your software. They can happen in the world as well, and your software needs to be able to respond to them.

Back to that holiday shopping season example from earlier: if your deploy freeze prevents all deployments during the holiday season to prevent breakages, what happens when your small but growing e-commerce site encounters a catastrophic bug that has always been there, but only occurs when you have more than 10,000 concurrent users. The breakage is coming from new, never before seen levels of traffic. The breakage is coming from your success, not your code. You’d better be able to ship a fix for that bug real fast, because your only other option to a fast turn-around bug-fix is shutting down the site entirely.

And if you see this failure for the first time on Black Friday, that is not the moment where you want to suddenly develop a new process for deploying on Friday. The only way to ensure that shipping that fix is easy is to ensure that shipping any fix is easy. That it’s a thing your whole team does quickly, all the time.

The motto “Move Fast And Break Things” caught on with a lot of the rest of Silicon Valley because we are all familiar with this toxic, paralyzing fear.

After we have the safety mechanisms in place to make changes as safe as they can be, we just need to push through it, and accept that things might break, but that’s OK.

Some Important Words are Missing

The motto has an implicit preamble, “Once you have done the work to make broken things safe enough, then you should move fast and break things”.

When you are in a conflict about whether to “go fast” or “go slow”, the motto is not supposed to be telling you that the answer is an unqualified “GOTTA GO FAST”. Rather, it is an exhortation to take a beat and to go through a process of interrogating your motivation for slowing down. There are three possible things that a person saying “slow down” could mean about making a change:

  1. It is broken in a way you already understand. If this is the problem, then you should not make the change, because you know it’s not ready. If you already know it’s broken, then the change simply isn’t done. Finish the work, and ship it to users when it’s finished.
  2. It is risky in a way that you don’t have a way to defend against. As far as you know, the change works, but there’s a risk embedded in it that you don’t have any safety tools to deal with. If this is the issue, then what you should do is pause working on this change, and build the safety first.
  3. It is making you nervous in a way you can’t articulate. If you can’t describe an known defect as in point 1, and you can’t outline an improved safety control as in step 2, then this is the time to let go, accept that you might break something, and move fast.

The implied context for “move fast and break things” is only in that third condition. If you’ve already built all the infrastructure that you can think of to build, and you’ve already fixed all the bugs in the change that you need to fix, any further delay will not serve you, do not have any further delays.

Unfortunately, as you probably already know,

This motto did a lot of good in its appropriate context, at its appropriate time. It’s still a useful heuristic for engineers, if the appropriate context is generally understood within the conversation where it is used.

However, it has clearly been taken to mean a lot of significantly more damaging things.

Purely from an engineering perspective, it has been reasonably successful. It’s less and less common to see people in the industry pushing back against tight deployment cycles. It’s also less common to see the basic safety mechanisms (version control, continuous integration, unit testing) get ignored. And many ex-Facebook engineers have used this motto very clearly under the understanding I’ve described here.

Even in the narrow domain of software engineering it is misused. I’ve seen it used to argue a project didn’t need tests; that a deploy could be forced through a safety process; that users did not need to be informed of a change that could potentially impact them personally.

Outside that domain, it’s far worse. It’s generally understood to mean that no safety mechanisms are required at all, that any change a software company wants to make is inherently justified because it’s OK to “move fast”. You can see this interpretation in the way that it has leaked out of Facebook’s engineering culture and suffused its entire management strategy, blundering through market after market and issue after issue, making catastrophic mistakes, making a perfunctory apology and moving on to the next massive harm.

In the decade since it has been retired as Facebook’s official motto, it has been used to defend some truly horrific abuses within the tech industry. You only need to visit the orange website to see it still being used this way.

Even at its best, “move fast and break things” is an engineering heuristic, it is not an ethical principle. Even within the context I’ve described, it’s only okay to move fast and break things. It is never okay to move fast and harm people.

So, while I do think that it is broadly misunderstood by the public, it’s still not a thing I’d ever say again. Instead, I propose this:

Make it safer, don’t make it later.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics like “how do I make changes to my codebase, but, like, good ones”.