Wednesday, July 02, 2008

Constructive Criticism

I frequently say that I'm a big fan of constructive criticism of Twisted, but I rarely get it.  People either gush about how incredibly awesomely spectacularly awesome Twisted is, or they directionlessly rant about how much it sucks, but aside from a fairly small group of regulars who file issues on the Twisted tracker, I don't hear much in between.

I caught wind of (and responded to) some blog comments of the latter type (directionless ranting) from Lakin Wecker.  After I responded, in an unusual response for someone writing such comments, he apologized and promised to do much better.  He has responded with some much more specific and potentially constructive criticism, ominously entitled "twisted part 1".

Lakin, thanks for reformulating your complaints in a more productive way.  I do think that some useful things might happen as a result of this article.  While I don't necessarily agree with it, I do care about this type of criticism.  In order to demonstrate my appreciation, I will try to make this a thorough reply.

It sounds like there are several mostly separate issues that you had here.  I'll address them one at a time.

Twisted Mail

I believe that the main issue is that the twisted.mail API is missing some convenience functionality which will allow users to quickly build SMTP responders that deal with whole messages.  This is definitely a shortcoming of twisted.mail.

However, this shortcoming is not entirely unintentional.  In general, Twisted's interfaces encourage you to write code which scales to arbitrary volumes of input.  IMessage is a thing that can receive a message, rather than a fully parsed in-memory message, because we want to encourage users to write servers that don't fall over.  If you have to handle each line as it arrives, it's less likely that you'll die if you a message bigger than the memory of the machine that is running the server.

That's not to say that there shouldn't be some additional, higher-level interface which does what you want.  Quotient, for example, uses twisted.mail, but provides a representation of a message which has all of its data written to disk first, and efficient APIs for accessing things like headers without fetching the whole message back into memory.  twisted.mail almost provides something like this itself; if you poke around in twisted.mail.maildir and twisted.mail.mail, you'll find FileMessage (an implementation of a message which writes its contents to disk) and MaildirDirdbmDomain (an implementation of IDomain which uses a directory of maildirs to deliver messages).  Not that these would not have been useful for your use case: they just show that we're happy to have higher-level stuff implemented within Twisted.

One function which might be cool to provide is something which will parse an incoming SMTP message and convert it to an email.Message.Message, then hand it off to some user code.  Even better would be to integrate this with the command-line "twistd mail" tool, such that you could easily deploy such a class as an LMTP server or something like that.

Although we don't have all the pieces you need, there is also the ever-present issue of documentation of the pieces which we do have.  Some of the code in twisted.mail might have been useful to you if its documentation had been better.  For example, you might also notice some pretty strong similarities between twisted.mail.protocols.DomainDeliveryBase.receivedHeader and your own implementation of that method.

My main point here is that fixing this is a simple matter of programming (or, in the latter case, of documenting).  I think that the best way to deal with that shortcoming is simply to submit patches to twisted.mail which add the functionality that you want.  Lots of open source projects are like this: they were driven just far enough to satisfy their implementors' use-cases.  twisted.mail is a perfectly functional and simple API if you want to build what it is designed to build.

When we're talking about "Twisted", we're typically talking about the core, and the programming model that comes with it.  When you get into the specifics of an API like twisted.mail, twisted.names, and even twisted.web (maybe even especially twisted.web) you're going to find plenty of shortcomings and areas that it don't yet do what you need.  There are some areas which are downright bad, and some which are so bad that they're embarrassing.  We need volunteers to identify the areas that are lacking and add to them.

Twisted vs. Things Which Are Not Twisted

The reason that I disagree with your conclusion that Twisted as a whole is necessarily more complex, hard to explain, too dense, unreadable (etc, etc) is that the main thing to compare it to is shared-state multithreaded socket servers, or asyncore.

Here's a good example of what makes Twisted simple, at its core:
from twisted.internet.protocol import Protocol
class Echo(Protocol):
  def dataReceived(self, data):
    self.transport.write(data)

This server supports a large number of clients.  It supports TLS.  It's cross-platform.  It supports both inbound and outbound connections.  And yet, including the import, it's only 4 lines of code.  You can write a threaded version of this which appears to be just as short, but it's pretty much impossible to do without getting a half-dozen subtleties of either a socket API or a concurrency issue wrong.

For example, your example "smtp_helper.py".  You don't provide any documentation of its concurrency properties, but the implementation of 'start' is almost certainly wrong.  For one thing, starting the same TestSMTPServer twice, or even starting two completely different TestSMTPServers at the same time, will not work.  Of course, you'd never do that, but let's say your SMTP client also used asyncore and a thread.  Now you've got a client using socket_map in the main thread and a server using socket_map in another thread.  Also, there's the fact that process_message may be called from an arbitrary thread; if it ever grew to do anything more complex than appending to a list, it would need its own serialization logic.  This isn't something that could be fixed — the entire approach is wrong, and you would need to rewrite all of your tests to work completely differently in order to fix it.  You'd need to asynchronously start both your client and your server, then have an API for letting your tests know when both of them are done.  By the time you're doing that, you're practically implementing your own mini-Twisted, along with extensions to unittest that turn it into Trial.

Ironically, you can use Twisted to fix this problem.  If you really like the API presented by the 'smtpd' module, you could write a wrapper which would make an asyncore dispatcher look like a Twisted protocol factory (or protocol), and hook asyncore into the main loop, then use 'trial' for your testing.  How exactly one would implement such a thing is beyond the scope of this post, but it's not actually that hard; just look at the relatively few methods that asyncore.dispatcher calls on self.socket and you'll probably get the idea.

I feel that the comparison of "Twisted" versus "non-Twisted" code you've presented is a bit unfair.  The Twisted example is a demonstration of utility functionality that Twisted Mail is missing, not a core idea that Twisted implements wrong.  The code it is being compared to looks simple only because critical areas of correctness that would need to be addressed in a real system (and will probably eventually need to be addressed, if the test is maintained for a long time) are being completely ignored.  The twisted example, if it fails, will fail relatively straightforwardly; the other example's failure mode will be an obscure traceback coming out of otherwise unrelated (but not thread-safe) code.

However, your subjective experience of some areas of Twisted being hard to understand and use is entirely valid.  Your detailed description of why it was difficult for you has already been useful, but I hope you will stick around and help us improve the situation for future users as well.

Trial and Testing

Perhaps the more significant issue that you discovered while you were working on this is the subtle mystery of getting Twisted to fully shut down a connection and a bound port inside a test.  This is really way too hard, and it is a problem which affects anyone who wants to use Trial for integration testing.

Although I'd really like to see this problem dealt with in a systematic way, and I'd like it to be easy as pie to write integration tests with trial, there is a reason that the issue hasn't been fixed.  As the Twisted team has been improving our testing skills, we've been finding more and more that you absolutely need good unit tests before you can really write integration tests.  Without unit tests, you don't know whether the individual pieces work, so they tend to break in surprising ways when you put them together.  In Twisted itself we are still in the process of rehabilitating a very large, and very old hodgepodge of unit, functional, and integration tests to be broken down into smaller, more coherent unit tests.  Until that process is finished, and trial has been tuned to be as good as possible for that sort of testing, integration testing isn't going to be a focus of any core developer.

I agree with the advice that you were given on IRC.  We could eliminate the particular surprise of doing a clean connection shut-down in trial, and provide a good way to do it, but you'd still face issues with your tests where the SMTP API might be scheduling timed calls or doing other things behind your back which would be difficult to monitor or shut down.  Talking to a mock message-sending implementation for starters would be a lot easier.

I can understand your concern about passing more parameters.  Luckily, this is Python: you don't necessarily need to change the interface of the system you're testing.  If you have a system, A, that depends on another system, B, to perform some of its work, you need to have a reference from A to B somewhere.  That can be passed as a parameter, imported as an object, or loaded as a module.  In Java, you'd need to change all your type declarations and do some kind of dependency injection magic, but in Python you can always cheat.  The worst case in Python, after all, is that A imports B as a module.  So, if you don't want to add any parameters, or even any attributes or methods, consider this:

# A.py
import B

def stuff():
  B.functionFromB().otherStuff()

# test_A.py
import unittest
import A
import B

class MyTest(unittest.TestCase):
  def functionFromB(self):
    result = B.functionFromB()
    # Modify the result for the test, if you like
    return result

  def setUp(self):
    A.B = self
  def tearDown(self):
    A.B = B


Some might consider this a bit gross, of course.  It might be cleaner to add a specific API for plugging in a different implementation of B.  However, it's useful to use this technique in cases — such as the one you described in your post — where you are trying to add some test coverage for an API which has already been written and you don't have control over.

I hope that digression helped, but I don't want to turn this into a screed about what you could have done better; let's consider your requirements as fixed (this needs to be an integration test) and look at what Twisted could have done better.

One thing the core team has been talking a lot about lately has been the development of verified test doubles.  We don't have a lot of them, and we need more.  For example, if you could pass a fake reactor to both your SMTP sender and receiver code, then you could manually make sure it was sending traffic at the appropriate times, to the appropriate hosts, and fail your test in sensible ways if it did something unexpected, rather than just having trial bomb out on you.  This would also let you have regression tests to make sure that your code was working with the latest version of Twisted, in case the APIs in question changed.  You wouldn't need your test to have a full, complete, clean shutdown of your SMTP connections because they would simply be garbage collected, as they would not be connected to the real reactor.  You can see an example of what this might look like in twisted.internet.task.Clock.  If someone contributed a real, documented, usable, verified test double for IReactorTCP, we would all be eternally grateful, especially if they could coalesce all the uses of the numerous half-assed attempts at it in our own test suite.

Something else we could do is write a supported factory wrapper which would allow the use of a real factory and connection in a trial test, but that would shut everything down cleanly at the connection level in tearDown.  I would personally like this a lot, but I can't promise that it would be popular with the rest of the Twisted team.  We all spend a lot of time trying to convince people to write unit tests before integration tests.  I know that I'm a little concerned that providing great integration testing support will just lead to more people being confused by weird interactions in the guts of whatever protocol they're talking to.  Eventually, however, integration tests can be useful, and I wrote the beginnings of the wrapper that I'm suggesting when I was writing tests for the AMP protocol.  You might be able to use that as an example even if Twisted doesn't provide any public APIs for that sort of thing.

Conclusions

Unfortunately there's not much I can do immediately to fix the problems that you've had, Lakin.  If someone with a similar level of Twisted experience attempts a similar task in the near future, it's likely that they'll hit the same issues.  I barely (read: didn't actually) have the time to write this blog post, and I definitely don't have the time to fix the problems I've outlined.

While there are definitely some problems here, I don't think the situation is really all that bad.  According to your post, learning enough about Twisted to do what you were doing and writing the Twisted version of this code took only 3 days.  This learning curve is not as steep as some have accused Twisted of having.  Presumably it would have taken someone already familiar with twisted.mail and trial much less time.  It didn't take me much more than 2 minutes to read and understand it :-).  As I mentioned above, your friend's threaded smtpd implementation has some pretty severe problems which might cause maintenance headaches later, whereas you were quite careful to do a proper shutdown (the trickiest thing to get right) in the Twisted version, so it is likely to be fairly robust going forward.

5 comments:

Lakin Wecker said...

Questions:
* You feel that I unfairly attacked twisted and that the majority of my issues were with two pieces of twisted: twisted.mail and twisted.trial. I'd like to better understand what you consider to be the 'core' of twisted. Twisted is an exclusive approach ... if you use twisted, you are nearly required (and at the very least STRONGLY encouraged) to use all of twisted's 'non-core' libraries to solve the problems they solve. Use twisted.web to speak http instead of httplib, use twisted.mail to send/receive email instead of smtp(d). So the quality of those libraries very directly affects a developers overall experience with twisted. I suspect it is probably the exception, not the rule, to build a networked application with a custom protocol, such that you would only ever need the core of twisted. I think it is very acceptable to apply lessons learned using any of twisted's 'non-core' libraries to the overall experience.

Agreements:

* twisted.mail would benefit from better documentation as well as a nicer higher-level api for those cases where one doesn't want to tweak everything. I agree, and if I had enough time to submit a patch to fixed twisted, I'd first want to finish the plethora of un-finished projects that I've built up over the years. ;)
* My test smtp server's start method would have synchronization issues if used outside of a single thread unittest environment that it was intended to run in.


Disagreements:
* That if we needed a more robust version of the threaded server (one that avoids the synchronization problems), that we'd have to change the approach and that the result would end up being a mini-twisted and mini-trial. You claim that the process_message may be called from an arbitrary thread and it'll not work if called from two threads. The asyncore methods aren't intended for use in this way, they're intended to have one thread always call process_message (through the use an asyncore.loop). If a programmer mistakenly calls it from two threads, it's the fault of the programmer, not the approach. Twisted is the same way. There are very few parts of Twisted that are threadsafe. And we could say the same thing about every part of twisted that isn't thread safe. IE: If some programmer calls function X from two threads, twisted will go *BOOM*. Following that line of reasoning, I suppose we could call the twisted approach "wrong" at that point too.

* That the twisted version will fail in a straight forward manner. You also imply that the error message you get from twisted will be more useful. Both of these are very subjective and in my experience BOTH twisted and threads provide stack traces and error messages which leave a lot to be desired. For instance, I just purposely sabotaged my twisted implementation and got 2 minute hang followed by 4 errors from 2 tests (something that always confused me is how twisted produces multiple errors from a single test if indeed it's not threaded. Shouldn't the first error stop the test?) and stack trace that's (mostly) unrelated to my code:

integration_tests_twisted
EmailTestMixin
test_conversation ... [ERROR]
[ERROR]
[ERROR]
[ERROR]
test_register ... [ERROR]

===============================================================================
[ERROR]: integration_tests_twisted.EmailTestMixin.test_conversation

Traceback (most recent call last):
Failure: twisted.internet.defer.FirstError: FirstError(<twisted.python.failure.Failure <class 'twisted.trial.unittest.FailTest'>>, 1)
===============================================================================
[ERROR]: integration_tests_twisted.EmailTestMixin.test_conversation

Traceback (most recent call last):
Failure: twisted.internet.defer.TimeoutError: <integration_tests_twisted.EmailTestMixin testMethod=test_conversation> (tearDown) still running at 120.0 secs
===============================================================================
[ERROR]: integration_tests_twisted.EmailTestMixin.test_conversation

Traceback (most recent call last):
File "/home/lakin/Desktop/twisted/part1/twisted/integration_tests_twisted.py", line 40, in confirm_first_email
self.assertTrue('racoon@foo.com says' in received_message)
twisted.trial.unittest.FailTest: None
===============================================================================
[ERROR]: integration_tests_twisted.EmailTestMixin.test_conversation

Traceback (most recent call last):
File "/home/lakin/Desktop/twisted/part1/twisted/integration_tests_twisted.py", line 47, in confirm_second_email
self.assertTrue('simon@foo.com says' in received_message)
twisted.trial.unittest.FailTest: None
===============================================================================
[ERROR]: integration_tests_twisted.EmailTestMixin.test_register

Traceback (most recent call last):
Failure: twisted.internet.defer.TimeoutError: <integration_tests_twisted.EmailTestMixin testMethod=test_register> (tearDown) still running at 120.0 secs
-------------------------------------------------------------------------------
Ran 2 tests in 240.119s

FAILED (errors=5)


The only two errors that indicate the potential issue are:
Failure: twisted.internet.defer.TimeoutError: <integration_tests_twisted.EmailTestMixin testMethod=test_register> (tearDown) still running at 120.0 secs
So, what did I do wrong here? ;)

glyph said...

re: questions:

The core of twisted is really twisted.internet.

You are correct, though, that the libraries that ship with Twisted are a big part of its draw. Issues with the quality of those libraries do definitely affect developers. You have identified some problems. However, the fact that those libraries have problems is not an indication that those libraries are of overall poor quality. For example, you identified a problem, that I agreed with, in twisted.mail.smtp for solving a particular simple problem; however, lots of other simple problems can quickly be dealt with using twisted.mail.smtp.

Some quick examples: ESMTP commands, including AUTH and STARTTLS. Scalable handling of arbitrarily sized messages (__lines is a private attribute in smtpd.py, and therefore must be in memory). Scaling to large numbers of sockets (asyncore does not support epoll, kqueue, or any other fancy, large scalable polling mechanism).

And that's just twisted.mail.smtp. If you start looking in, say, twisted.mail.imap4, you will find a remarkably full-featured IMAP client and server implementation, worlds beyond imaplib in terms of both correctness and completeness. Could it be better? Sure. Is it done? No. But it's not bad. Also, twisted.mail.pop3 provides a pretty good POP server and client. More than poplib's client-only offering. And, I might add, despite Twisted's eternal struggle with documentation, these have tons of docstrings which explain their function.

Agreements: I'm glad we've got some agreements.

Disagreements, part 1: I don't know what more I can say on this issue. Probably the best thing I can say is, "those who do not understand history are doomed to repeat it." It sounds like you don't really want to understand what trial does because you've decided that it's wrong somehow. I suspect that a few months down the road you will simply realize that I was right, and that you are implementing a bunch of features from trial. If not, well, great, hopefully you'll release your testing tool and I'll learn something from it. One useful thing that I learned, as I was following your claims: Twisted does not scream at you nearly loud enough if you try to run the reactor from multiple threads simultaneously :). However, it does say " Reactor already running! This behavior is deprecated since Twisted 8.0". On balance, I'm pretty sure almost nobody has actually tried that.

Also, we are talking about "the Twisted approach" vs. "the threaded approach" here. I am fairly sure that most asyncore developers would agree that your usage of asyncore here runs counter to the spirit of asyncore, which was sort of the proto-Twisted, and uses a similar approach. More specifically, the Twisted approach here would be use trial and run both test and SUT in the same event loop, not have two separate networking APIs talking to each other in two different threads of control. You could of course abuse Twisted in the same way that you have abused Asyncore (and it would work about as well).

Part 2: The most significant thing that Twisted will do is fail deterministically, so that you can investigate the cause of the failure. A threaded approach will

Investigating the test failures that you've posted is way beyond the scope of this comment (it looks like you've done a lot of stuff wrong... ;-)). I'd need to look at the code in question to really figure it out anyway (test failure messages will only take you so far). Also, I have to note that most of the error messages are your own asserts failing.

That FirstError that you see is an exposure of an ugly implementation detail that rarely shows up. Can't find the ticket at the moment, but I believe work is actively underway to get rid of it.

Twisted has multiple definitions of a "failure" for a test, however, including unhandled, unexpected errors being logged, so it is perfectly possible to have multiple failures. Logged errors can be trapped after the fact, so they get reported after the test has run; they do not immediately fail.

Lakin Wecker said...

You're right, problems with any piece of code do not necessarily make the overall code of bad quality, but it definitely does affect that conclusion. twisted.mail does provide more features than the standard libraries, and if the core problem space I was presented with was surrounding email, I might reconsider twisted as a platform.

Regarding our the concept of re-building a mini-twisted. I think I may now understand where you were coming from, and we may just be disagreeing because we've both are lacking some common agreements on the goals and purpose of said code. For me, that code is finished. It does the job that I was presented with, and it is extremely unlikely that I'll revisit it in the near future. Our core business problem is _not_ email, and our code has moved away from an async model to a threaded model. The use of a thread with asyncore.loop was a quick, easy way to simulate a server that's running in a different process from our tests, so that we could ensure that our code connects to and sends an email with an external smtp server when the appropriate input is provided. For that reason, I doubt we'll significantly change the approach in those email tests, and we definitely won't be using them as a building block for future infrastructure. Hence our disagreement. If we were still using an async approach, and this was part of a larger project based on these assumptions then you would be right. We'd probably immediately run into synchronization issues with that code, and due to the async nature we'd likely solve them using a similar approach to twisted or trial.

I don't think abuse is an appropriate term. Threads have long been compared to multiple processes running on different CPUs, and when writing integration tests, we could have chosen to control our environment in a number of approaches: 1. start two processes, one for an email server, one for our tests. or 2. Start two threads one for our email server, and one for our tests. Either way, (async or not) we'd have synchronization issues in accessing the shared list of email addresses. 3. Use an async approach with the server and email running in the same process with a single thread, and deal with all of the startup, teardown, connection and non-deterministic email arrival order which that entails. The point of my post was that it was significantly easier, took less time and code to pull off #1, and that's why we chose that route, and it's those types of issues that bug me the most when using any particular paradigm or framework.

Regarding the twisted errors and asking you to point out what was wrong: Haha, Yeah, that was a bit of a sarcastic, rhetorical question. But I think it does illustrate my point. Neither the threaded nor twisted approach provide nearly enough information in their error messages to be useful on their own. In my experience, both cause me to just return to the code and consider the possible lines of execution. As it was, I only commented out one line. So there was only one additional thing that was "wrong" when compared to the original twisted-code that I posted.

I'd love to learn more about the manner in which twisted fails deterministically. In my experience, it appears to not be deterministic. It has multiple execution paths, and those paths are dependent on the external networking environment which, because it's out of twisted's control, makes twisted appear to be non-deterministic in it's execution and failures. I plan to do a bit more research into the topic. First, I must find the appropriate material, and the first step to finding that material usually is asking the people who are familiar with the subject material to pass on resources they know about. So, where should I start reading to find out more about this?

In computer science, a deterministic algorithm is an algorithm which, in informal terms, behaves predictably. Given a particular input, it will always produce the same output, and the underlying machine will always pass through the same sequence of states. Deterministic algorithms are by far the most studied and familiar kind of algorithm, as well as one of the most practical, since they can be run on real machines efficiently. [1]

We'd have to agree on what parts of the entire problem space should be included when attempting to agree on what the 'same input' means. Would all external networked computers, and their state be included? Probably not because they're usually out of our control. Would all things outside of our process be excluded? If so, then that's unfair to the threaded approach as the os-scheduler is equally part of the problem which affects the order of execution as twisted's scheduler is. Should other processes interfering with shared computer resources be included? Maybe, both approaches would be affected by these in ways that would alter the outcome.

What is your definition of determinism when you say that twisted will fail deterministically? What is your definition of failure? All of these things are important for me to be able to understand, and (maybe) respond to that claim. It's required for me to being to reconsider my feelings on the matter which prompted me to start this discussion.

[1] - http://en.wikipedia.org/wiki/Deterministic_algorithm

glyph said...

Re: mini-twisted: I understand your core business problem is not email. I also understand that at this point it may be too late to revisit your decision to abandon twisted for that particular problem. However, using and contributing to (something like) Twisted makes sense because your core problem isn't email. Do you really want to care when an SMTP server you need to talk to starts requiring TLS or authentication for some reason?

The decision may well be made for the code that you're talking about; I'm not saying that this problem is necessarily important enough for your business to spend resources going back to Twisted. But problems like this tend to build up over time, and it's definitely worth the investment to do what you're doing, i.e. learn more about how to use Twisted correctly. Alan Perlis said it well in Epigram 4: "Every program is part of some other program and rarely fits." You are always working on a larger system than you think you're working on.

re: "I'd love to learn more about the manner in which twisted fails deterministically. In my experience, it appears to not be deterministic."

You've caught me in a bit of sloppy thinking here; you're right, "Twisted" doesn't fail deterministically. Twisted's goal is, after all, to provide I/O and interaction with the system clock. Almost the definition of non-determinism.

Let me back up and explain what I was referring to, and try to describe it more correctly. The Twisted development team values predictable behavior, and deterministic behavior is the most predictable of all. However, total determinism isn't always possible, and it's easily to unintentionally invoke some library functionality which makes your code non-deterministic; Twisted tries to make your non-determinism more predictable in those cases. This is accomplished in a few ways: Twisted's programming model provides discrete nondeterminism, and Trial monitors both the most likely sources and consequences of nondeterminism: reactor state modifications, and logged error messages.

This state-tracking that trial does is really the part that I was referring to when I said that Twisted would "fail deterministically", and I didn't really describe it well at all. What I meant was, if you fail to clean things up, if you get a concurrency issue wrong, there are a lot of cases (although obviously not all) where Twisted+Trial will report the problem to you when other libraries (including those which also use async I/O like pyevent and asyncore) will go along as if everything's fine. For example, your threaded asyncore example can easily leak connections in socket_map between tests, causing other tests to fail in extremely nondeterministic ways; sources of nondeterminism from outside the test itself, even, that cannot be reproduced by running the test itself under different load; you have to run the tests in the same order under varying load.

Code written in a Twisted style will, generally, be more predictable than code written in a threaded style. The race conditions which can occur with non-deterministic Twisted code are sensitive to fewer issues than threaded code: rather than the interaction with N processes with the scheduler, I/O, clock, and system load, you only have to deal with the interactions of one process and the I/O that is arriving.

So, Twisted can definitely be non-deterministic. However, it provides a manageable level of non-determinism. In Twisted-style code, only the places where you explicitly yield control to the reactor are non-deterministic. In multithreaded code, every line of code becomes a potential source of non-determinism.

You seem to really like the Zen of Python, so I'll put it in those terms: Explicit is better than implicit. :)

Since you have a fixed number of points of non-determinism within Twisted code, it becomes reasonable to write unit tests for all the relevant interactions between different invocation orders. Every time you do something which yields control to the reactor, you can write tests which verify that it does what you want, regardless of the order that the provided callbacks are invoked in. You can control the order of invocations because everything's just a method call. In thread land this is, practically speaking, not possible, because you can't run every line in the system interleaved in every possible order with every other line in the system, or even interleaved in different orders just with itself.

So, in the Twisted style, it's easier to write code that works in the first place, since you can test it one step at a time. In many cases you don't even have to "fake" anything, since the "fake" part is the code that's doing the calling, not the code that's being called; you just have to make sure that it's OK to call a() b() as well as b() a().

But, of course, nobody's perfect and you will often discover you've messed up somehow. If you do that in a threaded program, how do you write a regression test to make sure that same race condition doesn't show up again? Actually, I know the answer to that question but it is a very ugly answer. Not enough room in an already-huge blog comment to describe it in detail, but you can see my most recent application of the required technique here: http://bit.ly/1gHmjJ . If you do it in a Twisted program, you just invoke the racey bits of logic in the "wrong" order, watch it fail, fix it, watch it pass. Your test is good until you change your API, and will catch any changes which re-introduce the a similar enough race condition.

In summary, your statement that "the os-scheduler is equally part of the problem which affects the order of execution as twisted's scheduler", while superficially correct, is misleading. Twisted's scheduler is trivially reproducible and controllable from Python code. If we had a version of the Linux kernel with a Python API where you could dynamically and deterministically control the scheduler, then the two approaches might be closer together, but as it stands it's impossible to achieve the same level of control and understanding with threads than it is with events, forget about just as easy.

Also, when you say "threads have long been compared to multiple processes running on different CPUs"... I'm not sure who you think is doing the comparing, but shared-state multithreading and message-passing multiprocessing are very different beasts. If you are doing multiprocessing with a control channel (whether or not you're using Twisted, or even a twisted-style approach with a single main loop), then the synchronization issues you face are very simple: only dole out email addresses when the control channel is being polled. You don't have to worry about it otherwise; access the list however you want. With multithreading you have to synchronize every possible access to that list, whether or not it has anything to do with communication. It's true that multiprocessing introduces some nondeterminism, but multithreading is much, much worse.

PJE said...

By the way, Twisted is unnecessarily non-deterministic in its handling of scheduling, given a low-enough granularity clock or a fast enough CPU. Two sequential callLaters may or may not be called in the order they were scheduled in. (Early versions of Twisted didn't have this problem, as they used a stable insertion order in the scheduling queue.)

For testing, I generally replace I/O operations and the clock with deterministic variants thereof; see e.g. test_reactor.py in the Trellis library. I find it very useful in general to run timing-related tests using simulated clocks.