A grotesque parody of science

Thursday January 27, 2005
Unbelievably naive mistake, or politically motivated lies? You be the judge.

I have already spent way too much time talking about this piece of garbage, so I am putting up a little persistent comment about it.

This so-called "benchmark" is nothing more than an insult. Yes, he has graphs: but these graphs are not actually measuring anything. It claims to be performance metrics on an MDI application doing image transformations.

The advocacy position which he is so anxious to misinterpret is this: Twisted is an alternative to threads for multiplexed I/O. Many, many applications multiplex I/O, and they do it in a variety of ways which are inefficient and bug-prone, the most popular (and most dangerous) approach being thread-per-connection. There is an explanation of the problem (as well as a link to Twisted) at threading.2038bug.com.

He accuses Twisted developers of letting emotions get in the way. Am I emotional about this? Yes, I am. I am angry about it for good reason. The Twisted team has spent almost a hundred man-years of effort producing something that we could be proud of, to provide our users with a better, more test-friendly, less error-prone way of programming. It's not a new idea, but I like to think we've brought something useful to the table by providing an integrated platform that supports it.

I don't think that my efforts, or those of my colleagues, deserve to be misrepresented in this way. I am upset and writing about this because I am afraid that programmers who don't know any better will see graphs and think that he's proposing a legitimate approach to solve some problem, and I will one day be called upon to help them with their code. Code which will, thanks to "benchmarks" like this one, inevitably, perform extremely poorly and be a total mess of race conditions. I am trying to improve the state of the art in the industry, and it is a lot of work. Widespread circulation of an afternoon's dalliance with a plotting program like the above can undo years of work trying to educate people.

I thought that I would write some benchmarks where the graphs went in the other direction, but the internet is already brimming with graphs that demonstrate the superior scalability of multiplexing with events rather than threads. Dignifying this sophomoric potshot with actual data would be a lot more than it deserves. If you are truly skeptical, I would recommend attending Itamar Shtull-Trauring's Twisted talk at PyCon 2005, "Fast Networking with Python". This will give you a much clearer view of Twisted's current performance problems than the graphs drawn by some script-kiddie who has a grudge because he got banned from an IRC channel for spreading lies.

Just as refuting the numbers would be a waste of effort, refuting every one of the lies and/or misunderstandings in each paragraph could take all day. Instead, I'll just debunk a sample of the most egregious stuff, and hopefully the pattern won't be too hard to extrapolate, for those of you unfamiliar with the subject matter.

Threads will work on multiprocessor and hyperthreading machines automatically. On similar hardware Twisted will use only 1/n of the available processing power, where n is the number or virtual or physicial [sic] processors.

What he means is, python will only use 1/n of the available processing power. For code that actually makes use of SMP, you need to relinquish the global interpreter lock, and write all parallelized code in C. This is thanks to the global interpreter lock, a problem which is hard to solve, since making Python multi-CPU friendly actually makes it slower. Note - you don't need to do anything special to take advantage of SMP if you use Twisted's recommended, non-threaded way of parallelizing things, which is spawning multiple worker subprocesses.

So, Twisted can use exactly as much processing power as Python, especially because Twisted supports threads.

Twisted people think you should spawn a separate process for intensive work. If you do this you need to synchronize resource access as you would with threads. That is probably the most mentioned problem with threads.

You don't need to synchronize resource access when you use subprocesses. You can copy data to multiple subprocesses and serialize resource access, without doing any extra work, since Twisted will deliver attempts at resource access through normal I/O delivery channels, which are processed by the main-loop. Also, you can run your subprocesses in isolation, without concern for synchronized access to shared data structures. In the case which he seems to be talking about, e.g. that of large shared memory objects which need to be mutated and then operated upon by multiple cooperating processes, you can still avoid locking by using a tuple-space model of interaction and delivering work to subprocesses through pipes rather than delivering data. This still only has one simple program managing the interactions of many, taking advantage of the OS's much-vaunted thread-switching abilities, and doesn't require mutexes on every operation.

You also need to find an effective portable method of inter-process communication. This is not much of a problem, but it is something you wouldn't have to do with threads.

You certainly have to do this with threads. It might appear as though you do not, and in some cases it may be slightly easier to implement, but if you don't track which objects are in use by which threads, mutex overhead will quickly cripple any performance gains that you'd see in a threaded application. So your IPC mechanism with threads is queues, or condition variables, rather than pipes or shared memory, but it's still there and it still requires a lot of maintenance.

In conclusion, I stand by one of arensito's last claims:

When I approached Twisted people with questions about these results I was told I was not worth listening to. Followers stated bluntly they were smarter than me.

In fact, he isn't worth listening to, and I am proud to say that the Twisted team is smarter than he is. More than that: he's worth not listening to. The "benchmark" that he has proposed does not test anything about Twisted, and does not test anything meaningful about his hypothesis.

There's lots of work to be done on Twisted, and it certainly has its share of performance problems. It's by no means the fastest system of its kind. I am always excited to hear about ways it can be improved, but don't just make up a bunch of lies, write a while loop, slap a graph on it, and claim you've discovered something better.