Updated 2020-07-19: While many of the conclusions in this article remain valid
and interesting, a few of them — particularly those related to raising run-time
exceptions rather than using
Nones to signal errors — have been subtly
changed by the advent of mypy's ability to force the caller to check for
None automatically. Effectively, when this was written,
Python did not have real
Optional types, and now, thanks to mypy, it
does, so you should probably use them!
NULL has, rightly, been called a
“billion dollar mistake”.
If that is so, then
None is a hundred million dollar mistake, at least.
Forgetting, for the moment, about the
pitfalls of a C-style
None has a very significant problem of its own. Of course,
the problem is not
None itself; the fact that the default return value of a
None is (in my humble opinion, at least) fine; it’s just a marker
that means “nothing to see here, move along”. The problem arises from values
which might be
None, or might be some other, useful thing.
APIs present values in a number of ways. A value might be exposed as the
return value of a method, an attribute of an object, or an entry in a
collection data structure such as a list or dictionary. If a value presented
in an API might be
None or it might be something else, every single
client of that API needs to check the type of the value that it’s calling
before doing anything.
Since it is rude to use a “simple suite” (a line of code like
if x: y() with
no newline after the colon), that means the minimum number of lines of code for
interacting with your API is now 4: one for the
if statement, one for the
then clause, and one for the
Worse than the code-bloat required here, the default behavior, if your forget to do this checking, is that it works sometimes (like when you’re testing it), and that other times (like when you put it into production), you get an unhelpful exception like this:
1 2 3 4
NoneType doesn’t have an attribute called
method, but why is
NoneType? Science may never know.
In languages with static type declarations, there’s a concept of an Option type. Simply put, in a language with option types, the API declares its result value as “maybe something, maybe null”, and then if the caller fails to account for the “null” case, it is a compile-time error.
Python doesn’t have this kind of ahead-of-time checking though, so what are we
to do? In order of my own personal preference, here are three strategies for
getting rid of maybe-
None-maybe-not data types in your Python code.
1: Just Say No
Some APIs - especially those that require building deeply complex nested trees
of data structures - use
None as a way to provide a convenient mechanism for
leaving a space for a future value to be filled out. Using such an API
sometimes looks like this:
1 2 3 4 5
In this case, the way to get rid of
None is simple: just stop doing that.
.foo having an implicit type of “
None”, just make it always be
int, like this:
1 2 3 4
do_something is the only method you’re going to call with this data
structure, opt for the even simpler:
MyValue has dozens of fields that need to be initialized with different
subsystems, so you actually want to pass around a partially-initialized
object, consider the
Builder pattern, which would
make this code look like the following:
1 2 3 4 5 6
This acknowledges that the partially-constructed
MyValueBuilder is a
different type than
MyValue, and, crucially, if you look at its API
documentation, it does not misleadingly appear to support the
operation which in fact requires
baz all be initialized.
Wherever possible, just require values of the appropriate type be passed in in
the first place, and don’t ever default to
2: Make The Library Handle The Different States, Not The Caller
None is a placeholder indicating an implicit
state machine, where the
states are “initialized” and “not initialized”.
For example, imagine an RPC Client which may or may not be connected. You might have an API you have to use like this:
1 2 3 4 5 6 7 8
By leaking through the
rpc_client is providing an
incomplete abstraction and foisting off too much work to its callers. Instead,
callers should just have to do this:
1 2 3 4
rpc_client still has to maintain a private
attribute which may or may not be present, but by hiding this implementation
detail, we centralize the complexity associated with managing that state in one
place, rather than polluting every caller with it, which makes for much better
Hopefully you agree that this is a good idea, but this is more what to do rather than how to do it, so here are two strategies for achieving “make the library do it”:
2a: Use Placeholder Implementations
However, rather than using
None as the
rpc_client’s internal implementation could instead use a placeholder which
provides the same interface. Let’s the expected interface of
this case is just a
send method that takes some bytes. We could initialize
it initially with this:
1 2 3 4 5
This allows the code within
rpc_client itself to blindly call
self._connection.send whether it’s actually connected already or not; upon
connection, it could un-buffer that data onto the ready connection.
2b: Use an Explicit State Machine
Sometimes, you actually have quite a few states you need to manage, and this
starts looking like an ugly proliferation of lots of weird little flags;
various values which may be
None or not-
In those cases it’s best to be clear about the fact that there are multiple states, and enumerating the valid transitions between them. Then, expose a method which always has the same signature and return type.
Automat, in particular, goes to great lengths to make your objects look like
plain old Python objects. Providing an input is just calling a method on your
my_state_machine.provide_an_input() and receiving an output is just
examining its return value. So it’s possible to refactor your code away from
having to check for
None by using this library.
For example, the connection-handling example above could be dealt with in the RPC client using Automat like so:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
3: Make The Caller Account For All Cases With Callbacks
The absolute lowest-level way to deal with multiple possible states is to, instead of exposing an attribute that the caller has to retrieve and test, expose a function which takes multiple callbacks, one for each case. This way you can provide clear and immediate error feedback if the caller forgets to handle a case - meaning that they forgot to pass a callback. This is only really suitable if you can’t think of any other way to handle it, but it does at least provide a very clear expectation of the interface.
To re-use our connection-handling logic above, you might do something like this:
1 2 3 4 5 6 7
Notice that while this is slightly awkward, it has the nice property that the
connection_present callback receives the value that it needs, whereas the
connection_not_present callback doesn’t receive anything, because there’s
nothing for it to receive.
The Zeroth Strategy
Of course, the best strategy, if you can get away with it, may be the
non-strategy: refuse the temptation to provide a maybe-
None, just raise an
exception when you are in a state where you can’t handle. If you intentionally
raise a specific, meaningful exception type with a good error message, it will
be a lot more pleasant to use your API than if return codes that the caller has
to check for pop up all over the place,
None or otherwise.
The Principle Of The Thing
The underlying principle here is the same: when designing an API, always
provide a consistent interface to your callers. An API is 1 to N: you have 1
API implementation to N callers. As N→∞, it becomes more important that any
task that needs performing frequently is performed on the “1” side of the
equation, and that you don’t force callers to repeat the same error checking
over and over again.
None is just one form of this, but it is a particularly