“Composition is better than inheritance.”. This is a true statement. “Inheritance is bad.” Also true. I’m a well-known compositional extremist. There’s a great talk you can watch if I haven’t talked your ear off about it already.
Which is why I was extremely surprised in a recent conversation when my interlocutor said that while inheritance might be bad, composition is worse. Once I understood what they meant by “composition”, I was even more surprised to find that I agreed with this assertion.
Although inheritance is bad, it’s very important to understand why. In a high-level language like Python, with first-class runtime datatypes (i.e.: user defined classes that are objects), the computational difference between what we call “composition” and what we call “inheritance” is a matter of where we put a pointer: is it on a type or on an instance? The important distinction has to do with human factors.
First, a brief parable about real-life inheritance.
You find yourself in conversation with an indolent heiress-in-waiting. She complains of her boredom whiling away the time until the dowager countess finally leaves her her fortune.
“Inheritance is bad”, you opine. “It’s better to make your own way in life”.
“By George, you’re right!” she exclaims. You weren’t expecting such an enthusiastic reversal.
“Well,”, you sputter, “glad to see you are turning over a new leaf”.
She crosses the room to open a sturdy mahogany armoire, and draws forth a belt holstering a pistol and a menacing-looking sabre.
“Auntie has only the dwindling remnants of a legacy fortune. The real money has always been with my sister’s manufacturing concern. Why passively wait for Auntie to die, when I can murder my dear sister now, and take what is rightfully mine!”
Cinching the belt around her waist, she strides from the room animated and full of purpose, no longer indolent or in-waiting, but you feel less than satisfied with your advice.
It is, after all, important to understand what the problem with inheritance is.
The primary reason inheritance is bad is confusion between namespaces.
The most important role of code organization (division of code into files, modules, packages, subroutines, data structures, etc) is division of responsibility. In other words, Conway’s Law isn’t just an unfortunate accident of budgeting, but a fundamental property of software design.
For example, if we have a function called multiply(a, b)
- its presence in
our codebase suggests that if someone were to want to multiply two numbers
together, it is multiply
’s responsibility to know how to do so. If there’s
a problem with multiplication, it’s the maintainers of multiply
who need to
go fix it.
And, with this responsibility comes authority over a specific scope within the
code. So if we were to look at an implementation of multiply
:
1 2 3 |
|
The maintainers of multiply
get to decide what product
means in the context
of their function. It’s possible, in Python, for some other funciton to
reach into multiply
with frame objects and mangle the meaning of product
between its assignment and return
, but it’s generally understood that it’s
none of your business what product
is, and if you touch it, all bets are
off about the correctness of multiply
. More importantly, if the maintainers
of multiply wanted to bind other names, or change around existing names, like
so, in a subsequent version:
1 2 3 4 5 |
|
It is the maintainer of multiply
’s job, not the caller of multiply
, to make
those decisions.
The same programmer may, at different times, be both a caller and a maintainer
of multiply
. However, they have to know which hat they’re wearing at any
given time, so that they can know which stuff they’re still repsonsible for
when they hand over multiply
to be maintained by a different team.
It’s important to be able to forget about the internals of the local variables in the functions you call. Otherwise, abstractions give us no power: if you have to know the internals of everything you’re using, you can never build much beyond what’s already there, because you’ll be spending all your time trying to understand all the layers below it.
Classes complicate this process of forgetting somewhat. Properties of class
instances “stick out”, and are visible to the callers. This can be powerful —
and can be a great way to represent shared data structures — but this is
exactly why we have the ._
convention in Python: if something starts with an
underscore, and it’s not in a namespace you own, you shouldn’t mess with it.
So: other._foo
is not for you to touch, unless you’re maintaining
type(other)
. self._foo
is where you should put your own private state.
So if we have a class like this:
1 2 3 |
|
we all know that A()._note
is off-limits.
But then what happens here?
1 2 3 4 |
|
B()._note
is also off limits for everyone but B
, except... as it turns out,
B
doesn’t really own the namespace of self
here, so it’s clashing with what
A
wants _note
to mean. Even if, right now, we were to change it to
_note2
, the maintainer of A
could, in any future release of A
, add a new
_note2
variable which conflicts with something B
is using. A
’s
maintainers (rightfully) think they own self
, B
’s maintainers (reasonably)
think that they do. This could continue all the way until we get to _note7
,
at which point it would explode violently.
So that’s why Inheritance is bad. It’s a bad way for two layers of a system to communicate because it leaves each layer nowhere to put its internal state that the other doesn’t need to know about. So what could be worse?
Let’s say we’ve convinced our junior programmer who wrote A
that inheritance
is a bad interface, and they should instead use the panacea that cures all
inherited ills, composition. Great! Let’s just write a B
that composes
in an A
in a nice clean way, instead of doing any gross inheritance:
1 2 3 4 |
|
Uh oh. Looks like composition is worse than inheritance.
Let’s enumerate some of the issues with this “solution” to the problem of inheritance:
- How do we know what attributes
Bprime
has? - How do we even know what type
a
is? - How is anyone ever going to
grep
for relevant methods in this code and have them come up in the right place?
We briefly reclaimed self
for Bprime
by removing the inheritance from A
,
but what Bprime
does in __init__
to replace it is much worse. At least
with normal, “vertical” inheritance, IDEs and code inspection tools can have
some idea where your parents are and what methods they declare. We have to
look aside to know what’s there, but at least it’s clear from the code’s
structure where exactly we have to look aside to.
When faced with a class like Bprime
though, what does one do? It’s just
shredding apart some apparently totally unrelated object, there’s nearly no way
for tooling to inspect this code to the point that they know where
self.<something>
comes from in a method defined on Bprime
.
The goal of replacing inheritance with composition is to make it clear and
easy to understand what code owns each attribute on self
. Sometimes that
clarity comes at the expense of a few extra keystrokes; an __init__
that
copies over a few specific attributes, or a method that does nothing but
forward a message, like def something(self): return self.other.something()
.
Automatic composition is just lateral inheritance. Magically auto-proxying all methods1, or auto-copying all attributes, saves a few keystrokes at the time some new code is created at the expense of hours of debugging when it is being maintained. If readability counts, we should never privilege the writer over the reader.
-
It is left as an exercise for the reader why
proxyForInterface
is still a reasonably okay idea even in the face of this criticism.2 ↩ -
Although ironically it probably shouldn’t use inheritance as its interface. ↩