Updated 2020-07-19: While many of the conclusions in this article remain valid
and interesting, a few of them — particularly those related to raising run-time
exceptions rather than using Nones to signal errors — have been subtly
changed by the advent of mypy's ability to force the caller to check for
None automatically. Effectively, when this was written,
Python did not have real Optional[] types, and now, thanks to mypy, it
does, so you should probably use them!
NULL has, rightly, been called a
“billion dollar mistake”.
If that is so, then None is a hundred million dollar mistake, at least.
Forgetting, for the moment, about the
numerous
pitfalls of a C-style
NULL, Python’s None has a very significant problem of its own. Of course,
the problem is not None itself; the fact that the default return value of a
function is None is (in my humble opinion, at least) fine; it’s just a marker
that means “nothing to see here, move along”. The problem arises from values
which might be None, or might be some other, useful thing.
APIs present values in a number of ways. A value might be exposed as the
return value of a method, an attribute of an object, or an entry in a
collection data structure such as a list or dictionary. If a value presented
in an API might be None or it might be something else, every single
client of that API needs to check the type of the value that it’s calling
before doing anything.
Since it is rude to use a “simple suite” (a line of code like if x: y() with
no newline after the colon), that means the minimum number of lines of code for
interacting with your API is now 4: one for the if statement, one for the
then clause, and one for the else clause.
Worse than the code-bloat required here, the default behavior, if your forget to do this checking, is that it works sometimes (like when you’re testing it), and that other times (like when you put it into production), you get an unhelpful exception like this:
1 2 3 4 | |
Of course NoneType doesn’t have an attribute called method, but why is
value a NoneType? Science may never know.
In languages with static type declarations, there’s a concept of an Option type. Simply put, in a language with option types, the API declares its result value as “maybe something, maybe null”, and then if the caller fails to account for the “null” case, it is a compile-time error.
Python doesn’t have this kind of ahead-of-time checking though, so what are we
to do? In order of my own personal preference, here are three strategies for
getting rid of maybe-None-maybe-not data types in your Python code.
1: Just Say No
Some APIs - especially those that require building deeply complex nested trees
of data structures - use None as a way to provide a convenient mechanism for
leaving a space for a future value to be filled out. Using such an API
sometimes looks like this:
1 2 3 4 5 | |
In this case, the way to get rid of None is simple: just stop doing that.
Instead of .foo having an implicit type of “int or None”, just make it always be int, like this:
1 2 3 4 | |
Or, if do_something is the only method you’re going to call with this data
structure, opt for the even simpler:
1 | |
If MyValue has dozens of fields that need to be initialized with different
subsystems, so you actually want to pass around a partially-initialized value
object, consider the
Builder pattern, which would
make this code look like the following:
1 2 3 4 5 6 | |
This acknowledges that the partially-constructed MyValueBuilder is a
different type than MyValue, and, crucially, if you look at its API
documentation, it does not misleadingly appear to support the do_something
operation which in fact requires foo, bar, and baz all be initialized.
Wherever possible, just require values of the appropriate type be passed in in
the first place, and don’t ever default to None.
2: Make The Library Handle The Different States, Not The Caller
Sometimes, None is a placeholder indicating an implicit
state machine, where the
states are “initialized” and “not initialized”.
For example, imagine an RPC Client which may or may not be connected. You might have an API you have to use like this:
1 2 3 4 5 6 7 8 | |
By leaking through the connection attribute, rpc_client is providing an
incomplete abstraction and foisting off too much work to its callers. Instead,
callers should just have to do this:
1 2 3 4 | |
Internally, rpc_client still has to maintain a private _connection
attribute which may or may not be present, but by hiding this implementation
detail, we centralize the complexity associated with managing that state in one
place, rather than polluting every caller with it, which makes for much better
API design.
Hopefully you agree that this is a good idea, but this is more what to do rather than how to do it, so here are two strategies for achieving “make the library do it”:
2a: Use Placeholder Implementations
However, rather than using None as the _connection attribute,
rpc_client’s internal implementation could instead use a placeholder which
provides the same interface. Let’s the expected interface of _connection in
this case is just a send method that takes some bytes. We could initialize
it initially with this:
1 2 3 4 5 | |
This allows the code within rpc_client itself to blindly call
self._connection.send whether it’s actually connected already or not; upon
connection, it could un-buffer that data onto the ready connection.
2b: Use an Explicit State Machine
Sometimes, you actually have quite a few states you need to manage, and this
starts looking like an ugly proliferation of lots of weird little flags;
various values which may be True or False, or None or not-None.
In those cases it’s best to be clear about the fact that there are multiple states, and enumerating the valid transitions between them. Then, expose a method which always has the same signature and return type.
Using a state machine library like ClusterHQ’s “machinist” or my Automat can allow you to automate the process of checking all the states.
Automat, in particular, goes to great lengths to make your objects look like
plain old Python objects. Providing an input is just calling a method on your
object: my_state_machine.provide_an_input() and receiving an output is just
examining its return value. So it’s possible to refactor your code away from
having to check for None by using this library.
For example, the connection-handling example above could be dealt with in the RPC client using Automat like so:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | |
3: Make The Caller Account For All Cases With Callbacks
The absolute lowest-level way to deal with multiple possible states is to, instead of exposing an attribute that the caller has to retrieve and test, expose a function which takes multiple callbacks, one for each case. This way you can provide clear and immediate error feedback if the caller forgets to handle a case - meaning that they forgot to pass a callback. This is only really suitable if you can’t think of any other way to handle it, but it does at least provide a very clear expectation of the interface.
To re-use our connection-handling logic above, you might do something like this:
1 2 3 4 5 6 7 | |
Notice that while this is slightly awkward, it has the nice property that the
connection_present callback receives the value that it needs, whereas the
connection_not_present callback doesn’t receive anything, because there’s
nothing for it to receive.
The Zeroth Strategy
Of course, the best strategy, if you can get away with it, may be the
non-strategy: refuse the temptation to provide a maybe-None, just raise an
exception when you are in a state where you can’t handle. If you intentionally
raise a specific, meaningful exception type with a good error message, it will
be a lot more pleasant to use your API than if return codes that the caller has
to check for pop up all over the place, None or otherwise.
The Principle Of The Thing
The underlying principle here is the same: when designing an API, always
provide a consistent interface to your callers. An API is 1 to N: you have 1
API implementation to N callers. As N→∞, it becomes more important that any
task that needs performing frequently is performed on the “1” side of the
equation, and that you don’t force callers to repeat the same error checking
over and over again. None is just one form of this, but it is a particularly
egregious form.



