Twisted Do-Over

Saturday September 24, 2005
Recent fanfare over Twisted, including the totally awesome book which you should go buy right now, has gotten me thinking - is Twisted really all that great? I believe that while it is still probably the best thing out there for doing what it does, there are a few things I wish had happened differently. So here's a laundry list of things that I wish Twisted did differently, and how I would implement them if I were starting from scratch today. Maybe eventually this will be a roadmap - right now, it's wishful thinking, and too vague to be any real kind of spec.

In the innermost guts of the reactor, there is no real normalization of events. The reactor is sort of a fused engine block where all of the "work" of dispatching events happens. I'd rather that were unrolled a bit. Especially in today's world of generator-heavy Python, I'd rather that the reactor core look something like a set of wrapped iterators; a base generator that ran "select()" and yielded file descriptors ready for reading / writing; a generator that wrapped that which did OS- and FD-specific I/O, like recv() and send(); a wrapper above that generating application-level request/response pairs, and so forth. Think of this as a web server (in very, very broad strokes, this is not a precise API):


def webServer(self, connection):
for request in parseRequests(connection.inputStream):
response = self.respondTo(request)
yield response


Such a system would also make it a lot clearer what "one reactor iteration" meant. Rather than some arbitrary constellation of behaviors which happened to be scheduled "at the same time", one reactor iteration could be made to correspond exactly with one tick of a user-provided iterator.

Further up from that (but using that facility), I wish that we had included SEDA's notion of a "stage". This would have made a few things a lot easier. For example, it would be nice to have a well-defined notion of a request/response processing webserver that could generate a "response" object, possibly from a thread, but have that "response" be processed entirely asynchronously in the main thread.

In particular having a notion of a "stage" would make it a lot easier to run full database transactions within threads, isolating them from communications code, by stipulating that transactions must produce notifications or network I/O in the form of output objects placed into a queue. Recently I have surveyed some open-source Twisted code in the wild, and answered a bunch of questions, which have implied to me that many Twisted developers now believe that the correct way to interface with a relational database is to turn *EVERY* SQL statement into a Deferred which is handled individually.

This is a tangent, but allow me to offer a bit of advice. The documentation is really poor, and never says this, but using Twisted, or rather ADBAPI, to convert every single SQL statement into a separate transaction and handle its results separately, has a whole slew of problems. First of all, it's slow: you have to acquire and release thread mutexes on every operation. Second, it is unsafe. Your conceptual transactions might be interrupted at any moment, leaving your database in an inconsistent state. Also in the realm of safety, notifications generated from within a transaction that gets rolled back are sent to the network anyway, so two different database-using proceses talking to each other can trivially become inconsistent. Take a look at the 'runInteraction' API and give some thought to what represents a "whole" transaction in your database. Moving this transaction processing out of the main loop *IS* an appropriate use of threads, and in fact adbapi does it internally. This is doubly true if your application or your SQL layer does any caching of SQL results; to be sure that the cache is consistent with the DB, you have to keep track of whether and when transactions are rolled back.

Back to the main point. I also would have designed the reactor access API a bit differently. 'from twisted.internet import reactor' looks convenient, but is highly misleading. Figuring out what reactor your process is currently using is part of a more general problem of execution context. There are other objects that applications wish to find in the same way: the current database connection, for example, the current log monitor, or the current HTTP request. twisted.python.context deals with this in a general manner, but because it is not used consistently to access important objects, it has not been subjected to the testing and refining that it has needed. The worst side-effect of this has been the "context object" abomination that afflicts Nevow and Web2.

I also would have designed Deferreds as more central to the whole thing, and optimized the hell out of them rather than worrying about their performance. For example, it would be a lot easier for many applications if deferLater were the default behavior of callLater. Similarly to deferToThread vs. callFromThread. The main reason that the reactor does not use Deferreds for these, or for the client connection API, is because of a general feeling of nervousness about how it would be hard to implement Deferreds in C, so the reactor API shouldn't require them. In retrospect this is silly (especially now that James Knight has actually gotten further implementing Deferreds in C than anyone else has on getting the reactor implemented there).