The xUnit Paradox

Billions of years ago, at the dawn of the digital age, Kent Beck invented the idea of testing computer programs and wrote a paper about it called "Simple Smalltalk Testing: With Patterns". In it, he expressed an API which is now colloquially known as "xUnit", because there are implementations for various languages called xunit, where x is a reference to the language of implementation: for example, Java's "JUnit", Python's "PyUnit", and the C++ "CppUnit".

Many test-driven developers who use an xUnit implementation eventually become test-framework developers. After some time using an xUnit implementation, they realize that it lacks certain features which would make it more useful, and then writing their own testing tool. Such attempts, in Python, include:

Twisted's own trial tool, for whose genesis I am to blame.¹
py.test
TestOOB
nose
doctest

Unfortunately, many of these test frameworks are written only looking at the problems with xUnit, and not realizing its benefits.

There are two things that every potential test framework developer needs to know about xUnit.

The xUnit API is great.

You need a structured, well-specified way to manipulate tests. Maybe you don't realize it yet, but when you have a test suite with a hundred thousand tests, you will want to selectively run parts of it. You will want to be able to arbitrarily group tests differently, change their grouping, ordering, and runtime environment. Most importantly you'll want to do all this in a program, not necessarily with a GUI or command-line tool. To write that program, you need a coherent, well-designed, stable API for interacting with tests as first-class objects.

The biggest thing that xUnit does right is that it exists. It is a structured API for interacting with tests as first class objects. Many attempts to implement specific features end up architecturally breaking or ignoring this API, or adding extra, implicit stuff, which will "just work" if you use the particular TestCase class that came with your tool of choice, but break if you try to customize it too heavily or start overriding internal methods.

xUnit also factors some important responsibilities away from each other. A test case is different from a test result; a test suite contains multiple tests. The magic that is often associated with the 'testXXX' naming convention aside, a single test case object represents a single test, and multiple runs of the same tests must create multiple test objects. These might seem obvious, but they are all important insights which must be preserved: and problems occur when they don't seem quite so obvious. As James Newkirk said on his blog a few years ago: "I think one of the biggest screw-ups that was made when we wrote NUnit V2.0 was to not create a new instance of the test fixture class for each contained test method." Trial had this screw-up as well, and while it has been fixed, the '-u' option will still re-use a TestCase object to run its own method again for the second invocation.

Sadly, xUnit is not the solution to all of your problems. There is something else you need to know about it.

The xUnit API is terrible.

The xUnit API is missing a lot of features that it really needs to be a generally useful test-manipulation layer. It is because of these deficiencies that it is constantly being re-invented or worked around. If it did some more of these things right, then the ad-hoc extensions which don't conceptually work with it properly wouldn't keep springing up.

The main problem with xUnit that is not simply a missing feature is that it uses the "composite" pattern, rather than the "visitor" pattern, to implement test suites. If you want to discover what test cases that a test suite contains, you are supposed to call 'run' and it will run them for you. It is impossible to programmatically decompose a suite without stepping outside of the official xUnit API.

For example, let's say you wanted to have a "runner" object, which would selectively identify tests to be run in different subprocesses. Without getting involved in the particular implementation details of an xUnit implementation, there's no way to figure out how many tests are going to be run when you have discovered and invoked a TestCase; you just call 'run' and hope for the best. Of course, just about every actual xUnit implementation cheats a little bit in order to do things like generate progress bars to figure out how far done with the test run you are; but it's always possible - and intentionally supported, even - to generate tests on the fly within 'run' and run them.

xUnit has no intermediary representation of a test result. A vanilla xUnit API assumes that a result is the same thing as a reporter: in other words, when you call a method on the result, it reports it to the user immediately. This absence of a defined way to manipulate test results for later display means that you have to take over the running of the tests if you want to do something interesting with the report; it's not possible in a strict xUnit implementation to cooperatively define two separate tools which analyze or report data about the same test run.

The use of the composite pattern is linked to the lack of an intermediary "test discovery" step. In Beck's original framework, you have to manually construct your suites out of individual test case objects and call a top-level function that then calls them all, although most xUnit implementations accepted as "standard" these days will provide some level of convenience functionality there. For example, all the implementations I'm aware of will introspect for methods that begin with "test" and automatically create suites that contain one test case instance per method.

xUnit has no notion of cooperating hooks. If you want to provide some common steps for setUp and tearDown in both library A and library B, you have no recourse but to build complicated diamond inheritance structures around "a.testsupport.TestCase" and "b.testsupport.TestCase", or give up and manually call functions from setUp and tearDown. This is where Trial has gotten into the most trouble, because it sets up and tears down Twisted-specific state as part of the tool rather than allowing test cases to independently set it up.

The Alternative Is Worse

Those who have not learned from the benefits that xUnit provides, however, will be doomed to repeat its mistakes, and often make even more. The alternative - let's call it "AdHocUnit" - is very popular. It's to start glomming features into random parts of the test manipulation API to support specific use-cases without attention to the overall design of the system.

Twisted's Trial, the Zope Test Runner, Nose, and I'm sure quite a few other testing tools in Python all do this, and the result is a situation where the concepts are all basically compatible, but you can't write a test that uses features from two of these systems at once - and if you're a person (as there are many such people) who write code that relies heavily on both Twisted and Zope, this can be painful.

You can definitely tell when you've got an AdHocUnit implementation when your APIs are internally throwing around strings that represent file names, module names, and method names, without reference to test objects; when you've got global, non-optional start and stop steps (not in setUp or tearDown methods) which twiddle various pieces of framework state, or when you've started coupling the semantics of 'run' to some ad-hoc internal lists of tests or suites. You can tell you have an AdHocUnit implementation when your test discovery is convenient, but hard-coded to a particular idiom of paths and filenames. Most of all, you can tell you've got an AdHocUnit implementation when you've got some things called "TestCase" objects which claim to be xUnit objects that subclass from a TestCase class but mysteriously cannot be run by another xUnit framework in the same language.

What To Do Now

The object-oriented programming community needs a better API which is as high-level and generic as xUnit. Anyone looking to "fix" xUnit should be careful to create a well-defined structure for tests and document the API and the reasons for every decision in it. It's interesting to note that the (very successful) Beck Testing Framework was originally presented as a paper, not as an open source project.

It might seem like testing APIs don't require this kind of rigor. They seem deceptively simple to design: at first, all you need to do is run this bit of code, then that bit of code, and make sure they both work. For a while all you're doing is piling on more and more tests; making sure they all work.

Then, one day, you want to start doing things with all these tests. You want to know how long they take to run, how much disk space they use, how many of them call a certain API. You want to run them in different orders to make sure that they are in fact isolated, and don't interact with each other. You'll find out, as I did, that huge numbers of your tests are badly written because your first ad-hoc attempt at the framework was wrong.

Unlike most frameworks, you can't gradually evolve them and depend on your tests to verify their behavior, because by changing the testing framework, you might have changed the meaning of the tests themselves. Having a well-tested test framework can help, of course, but while you can test the test framework, you won't be testing your tests.

Of course, everything is possible in this world of infinite possibilities. Test frameworks can, and do, evolve; but the process is slower, and more painful than other kinds of evolution. So, when you're looking to write your own conveniences for testing, don't throw the baby out with the bathwater: keep what your xUnit implementation does well, retain compatibility with it, and build upon it.

Acknowledgments

I'd like to thank Jonathan Lange, who encouraged me to consider the benefits of xUnit in the first place, and the Massachusetts Bay Transportation Authority, without whose constant delays and breakdowns I wouldn't have had time to have written this at all.

1: I didn't really start 'trial', but a few nasty hacks in the redistribution of pyunit. In my defense, it predated 'unittest' as a standard-library module. Jonathan Lange, its current maintainer, was the one who made it an independent tool. Thanks to him, it is now actually compatible to a large extent with the standard 'unittest' module.