A collection of articles, ideas, and rambling from a guy who wrote some software that one time.

Thursday, July 17, 2008

Don't Call It Blogging

Despite my own impeccable credentials as an elite cyber-hacker, I am friends with a number of people who are bewildered by the profusion of different technologies that the internet now affords us to interact.  I recently had a conversation where someone was just confused by the whole "blogging" thing.  Why do people blagoblog on the intertron?  What is the point?  I'm a prolific "blogger" myself, I guess, but I found myself sympathizing as I tried to explain.

I'm a huge fan of the activity of blogging, but I have never liked the word, "blogging".  I never really understood why until I was attempting to explain what it's all about.

For thousands of years — well, okay I don't have any citations of exactly how long, due to the evolution of English as a language, but, for a really long time — we've had one word for the activity of "blogging".  We called it writing.  That's all you're doing when you're blogging.

If we were to describe the activity of a Sumerian scribe pressing symbols into soft clay, we'd say they were writing on that clay.  An ancient Egyptian putting words onto a sheet of papyrus: they are writing.  Similarly, we don't typically have separate words for "scrolling", "codexing", "booking", "newspapering", "magazining", and so on.  Each new technology for moving writing around didn't need a new verb.  So why has "blogging" gotten one?

I think there is a good reason this term exists, but that reason doesn't justify the term, it provides a warning, and a reason to try to actively resist the term and just say "writing".  The web is a more radical and democratizing shift in publishing technology than any of the ones which preceded it, so publishing on the web (especially automated publishing, as on a blog) affords a feedback cycle where the author and the audience are effectively peers.  In fact, the nature of the terms "author" and "audience" has changed; formerly a description of social classes, the people who produce and the people who consume, they have been re-framed as roles within an individual conversation.  You might be the audience when you're reading someone else's blog, but ten minutes later you can easily reverse that relationship with that author as you're writing your own.  This extremely rapid cycle has given a wholly new quality to the style of many blogs, unseen in any prior form of written media.

So, why resist the term "blogging"?  It confuses the possibilities that the medium presents with conventions that it enforces.  Writing is a powerfully diverse art.  A lot of it's good, a lot of it's bad.  "Blogging", however, is more specific, and unfortunately implies a sort of perpetual half-finished conversation.  It calls to mind a semi-private, informal, ephemeral, link-heavy style of extremely short-form writing.  This form has its masters: Tycho of Penny Arcade infamy leaps to mind immediately.  It also has a sea of mediocrity.  Statistically speaking, you can probably click the 'next blog' link at the top of this page for an immediate example.  I don't have a problem with any of this.  Even the "mediocrity" is just evidence of the degree to which this is empowering people: much of what I'd consider "mediocre" just isn't relevant to me, and isn't written for me.

But blogs can be, and are, so much more than that.  They are a disruptive technology in the world of publishing, where any style of writing can easily be published, circulated, and promoted.  One can write an entire novel, serialized chapter-by-chapter as blog posts.  Many people have, in fact, done this already.  You don't even need to emulate older forms of writing to step outside the style implied by "blogging".  The tools that the web affords — instant publishing, hyperlinks — are ideal for collaborative scientific research.  Hyperlinks take the work out of footnoting.

Prominent web writers who I respect also seem to avoid the use of the term "blog".  Joel Spolsky refers to other people's blogs, but the term "blog" does not appear anywhere describing his site, despite the fact that there is quite a bit of self-descriptive text that refers to "this site".  Paul Graham goes a step further, foregoing many traditional blog trappings and has a link that says, simply, "Essays".  I wonder if it's for this reason.

So, if you need to explain to someone who doesn't quite get what all the whole "blogging" thing is about, don't talk about social dynamics and the singularity and the mass popularization of media.  That's all great stuff, but it's a confusing distraction.  It's just like writing a book — or, more likely, a magazine.  Except you don't have to talk to a publisher.  And you don't have to have an editor.  And it's free.  And the publishing part doesn't actually take any time.  And it's accessible from anywhere in the world.  And you can read it on your cell phone.  When you stack up all the advantages, the lack of some bound paper doesn't seem like a big deal.

If you find this explanation useful, feel free to point your relatives at this post.  Tell them that you saw it on my blog, but don't tell them I blogged about it.  Tell them I wrote about it.

Wednesday, July 09, 2008

Conference FAIL

Last night at a dinner with Ivan Krstić and Itamar Shtull-Trauring, we were all lamenting that too many (all?) software conferences focus specifically on positive results. This is what you want, of course, if you treat a conference as purely a marketing venue. However, most learning takes place based on something that someone did wrong and then needed to correct, not something that they did right.

All of the great software developers I know have at least one great story of how a project they were working on was a complete disaster.  Often these projects are shielded from the public eye, since nobody wants to talk about failure.  So, how do we make a public discussion of these ideas socially acceptable?

Thus, an idea was born: FAILcon.  The idea is simple: submitted talks and papers must be related to projects which failed in an interesting way.  The larger the better, of course — the bigger they are, the harder they fail — but anything that failed in an interesting way would be a valid subject for discussion.

I'm writing about it so that it won't be forgotten, because I think it's a great idea.  But I doubt that any of us are going to organize a conference any time soon.  So please, steal this idea.  Does anyone out there with conference-organizing skills want to get something together based around the common theme of failure?

Friday, July 04, 2008

Static On The Wire

I am, as you might have guessed, a big fan of dynamic typing.  Yet, two prominent systems I've designed, the Axiom object database and the Asynchronous Messaging Protocol (AMP) have required systems for explicit declarations of types: at a glance, static typing.  Have I gone crazy?  Am I pining for my glory days as a Java programmer?  What's wrong with me?

I believe the economics of in-memory and on-the-wire data structures are very, very different.  In-memory structures are cheap to create and cheap to fix if you got their structure wrong.  Static typing to ensure their correctness is wasted effort.  On the other hand, while on-the-wire data structures (data structures which you exchange with other programs) can be equally cheap to create, they can be exponentially more expensive to maintain.

When you have an in-memory data structure, it's remarkably flexible.  It is, almost by definition, going to be thrown away, so you can afford to change how it will be represented in subsequent runs of your program.  So, when your compiler complains at you for getting the static type declarations wrong, it's just wasting your time.  You have to write unit tests anyway, and static typing makes unit testing harder.  What if you want a test that fakes just the method foo on an interface which also requires baz, boz, and qux, so you can quickly test a caller of foo and move on?  A really good static type system will just figure that out for you, but it probably needs to analyze your whole program to do it.  Most "statically typed" languages — such as the ones that actually exist — will force you to write a huge mess of extra code which doesn't actually do anything, just so all your round pegs can pretend to fit into square holes well enough to get your job done.

But I don't have to convince you, dear reader.  I'm sure the audience of this blog is already deeply religious on this issue, and they've got my religion.  I'm just trying to make sure you understand I'm not insane when I get to this next part.

The most important thing that I said about in-memory data structures, above, is that you throw them away.  It's important enough that I'll repeat it a third time, for emphasis: you throw them away.  As it so happens, the inverse is the most important property of an on-the-wire data structure.  You can't throw it away.  You have to live with it.

Forever.

Oh, sure, you told your customers that they all have to upgrade to protocol version 3.5, but they're still using 3.2.  Unless you're Blizzard Entertainment, you can't tell them to download the new version every six weeks or go to hell.  Even if you can do that (and statistically speaking, you probably aren't Blizzard Entertainment) you have to keep the old versions of the updater protocol around so that when version 4.0 comes out all the laggards who haven't even run your program since 3.0 can still manage to upgrade.

Here's the best part: your unit tests aren't going help you — at least, not in the same way they would with your in-memory data.  When you change an in-memory data structure, you aren't supposed to have to change your unit tests.  You want the behavior to stay the same, you don't change the tests; if they start failing, you know something is wrong. With your new protocol changes though, you can have tests for the old protocol, and tests for the new protocol, but every time you make a protocol change you need to a new test for every version of the protocol which you still support.  Plus, you probably can't stop supporting older versions of the protocol (see above).

If you've got a message X[3], and you're introducing X[4], you have to make sure that X[4] can talk to X[3] and X[2] and X[1].  Each of those is potentially a new test.  Each one is more work.  Even worse, it's possible to introduce X[4] without realizing that you've done it!  If you have a new, optional argument, let's call it "y", to a dynamically-typed protocol, your old tests (which didn't pass y) will pass.  Your new tests (which do pass y, to the newly-modified X[4]) also pass.  But there's a case which has now arisen which your tests did not detect: y could be passed to a client which only supports X[3], and an error occurs.

If this were some in-memory structures, that case no longer exists.  There is no version of X currently in your code which cannot accept y.  Your tests ensure that.  You have to time-travel into the past for your unit tests to discover the code which would cause them to fail.  You can't just do it once, either: maybe X[3] was designed to ignore all optional parameters.  You have to consider X[2] and X[1].  You have to travel back to all points in time simultaneously.

This is why I said that the cost is exponential: you carry this cost forward with each new supported version that gets released.  Of course, there are ways to reduce it.  You can design your protocol such that arguments which your implementation doesn't understand are ignored.  You can start adding version numbers to everything, or change the name of every message every time some part of its schema changes.  All of these alternatives get tedious after a while.

So what does this have to do with static typing?  Static type declarations can save you a lot of this work.  For one thing, it becomes impossible to forget you're changing the protocol.  Did you change the data's types?  If so, you need to add a compatibility layer.  These static type declarations give you key information: what do the previous versions of the protocol look like?  More importantly, they give your code key information: is an automatic transformation between these two versions of the data format possible?  (If not, is the manual transformation between these two versions correct?)

In a dynamically typed program, you can figure out your in-memory types are doing by running the debugger, inspecting the code that's calling them, and simply reading the code.  Sometimes this can be a bit spread out — in a badly designed system, painfully spread out — but the key point is that all the information you need is right in front of you, in the source code.  If you're working on code that is shipping data elsewhere without an explicit schema, you have to have a full copy of the revision history and some very fancy revision control tools telling you what the protocol looked like in the past.  (Or, perhaps, what the protocol that some other piece of software has developed used to look like in the past.)

Your disk is another kind of wire.  This one is particularly brutal, because while you might be able to tell someone to download a new client to be able to access a service, there is no way you are ever going to get away with saying "just delete all your data and start again.  there's a new version of the format."  When writing objects to disk (or to a database), you might not be talking across a network, but you're still talking to a different program.  A later version of the one you're writing now.  So these constraints all apply to Axiom just as they do to AMP; moreso, actually, because in the case of AMP all the translations can be very simple and ad-hoc, whereas in Axiom the translations between data types need to be specifically implemented as upgraders.

With a network involved, you also have to worry about an additional issue of security.  One way to deal with this is by adding linguistic support to the notion of untrusted code running "somewhere else", but type declarations can provide some benefit as well.  Let's say that you have a function that can be invoked by some networked code:

@myprotocol.expose()
def biggger(number):
    return number * 1000


Seems simple, seems safe enough, right?  'number' is a number taken from the network, and you return a result to the network that is 1000 times bigger.  But... what if 'number' were, instead, a list of 10,000 elements?  Now you've just consumed a huge amount of memory and sent the caller 1000 times as much traffic as they've sent you.  Dynamic typing allows the client side of the network connection to pass in whatever it wants.

Now, let's look at a slightly different implementation of that function:

@myprotocol.expose(number=int)
def biggger(number):
    return number * 1000


Now, your protocol code has a critical hint that it needs to make this code secure.  You might spell it differently ("arguments = [('number', Integer())]" comes to mind), but the idea is that the protocol code now knows: if 'number' is not an integer, don't bother to call this function.  You can, of couse, add checks to make sure that all the methods you want to call on your arguments are safe, but that can get ugly quickly.

Let's break it down.

Static type declarations have a cost.  You (probably) have to type a bunch of additional names for your types, which makes it difficult to write code quickly.  Therefore it is preferable to avoid that cost.

All the information you need about the code at runtime is present when you're looking at your codebase.  Therefore — although you may find its form more convenient — static type declarations don't provide any additional information about the code as it's running.  However, information about the code on opposite ends of the wire may only be in your repository history, or it may not be in your code at all (it could be in a different codebase entirely).  Therefore static typing provides additional information for the wire but not in memory.

At runtime, you only have to deal with one version of an object at a time.  On the wire, you might need to deal with a few different versions simultaneously in the same process.  Static type declarations provide your application with information it may need to interact with those older versions.

At runtime (at least in today's languages) you aren't worried about security inside your process.  Enforcing type safety at compile time doesn't really add any security, especially with popular VMs like the JVM not bothering to enforce type constraints in the bytecode, only in the compiler.  However, static type declarations can help the protocol implementation understand the expectations of the application code so that it does not get invoked with confusing or potentially dangerous values.  Therefore static type declarations can add security on the wire while they can't add security in memory.  (It turns out that if you care about security in memory, you need to do a bunch of other stuff, unrelated to type safety.  When the rest of the world catches up to the E language I may need to revisit my ideas of how type safety help here.)

If you have data that's being sent to another program, you probably need static type declarations for that data.  Or you need a lot of memory to store all those lists I'm about to multiply by 1000 on your server.

Wednesday, July 02, 2008

Constructive Criticism

I frequently say that I'm a big fan of constructive criticism of Twisted, but I rarely get it.  People either gush about how incredibly awesomely spectacularly awesome Twisted is, or they directionlessly rant about how much it sucks, but aside from a fairly small group of regulars who file issues on the Twisted tracker, I don't hear much in between.

I caught wind of (and responded to) some blog comments of the latter type (directionless ranting) from Lakin Wecker.  After I responded, in an unusual response for someone writing such comments, he apologized and promised to do much better.  He has responded with some much more specific and potentially constructive criticism, ominously entitled "twisted part 1".

Lakin, thanks for reformulating your complaints in a more productive way.  I do think that some useful things might happen as a result of this article.  While I don't necessarily agree with it, I do care about this type of criticism.  In order to demonstrate my appreciation, I will try to make this a thorough reply.

It sounds like there are several mostly separate issues that you had here.  I'll address them one at a time.

Twisted Mail

I believe that the main issue is that the twisted.mail API is missing some convenience functionality which will allow users to quickly build SMTP responders that deal with whole messages.  This is definitely a shortcoming of twisted.mail.

However, this shortcoming is not entirely unintentional.  In general, Twisted's interfaces encourage you to write code which scales to arbitrary volumes of input.  IMessage is a thing that can receive a message, rather than a fully parsed in-memory message, because we want to encourage users to write servers that don't fall over.  If you have to handle each line as it arrives, it's less likely that you'll die if you a message bigger than the memory of the machine that is running the server.

That's not to say that there shouldn't be some additional, higher-level interface which does what you want.  Quotient, for example, uses twisted.mail, but provides a representation of a message which has all of its data written to disk first, and efficient APIs for accessing things like headers without fetching the whole message back into memory.  twisted.mail almost provides something like this itself; if you poke around in twisted.mail.maildir and twisted.mail.mail, you'll find FileMessage (an implementation of a message which writes its contents to disk) and MaildirDirdbmDomain (an implementation of IDomain which uses a directory of maildirs to deliver messages).  Not that these would not have been useful for your use case: they just show that we're happy to have higher-level stuff implemented within Twisted.

One function which might be cool to provide is something which will parse an incoming SMTP message and convert it to an email.Message.Message, then hand it off to some user code.  Even better would be to integrate this with the command-line "twistd mail" tool, such that you could easily deploy such a class as an LMTP server or something like that.

Although we don't have all the pieces you need, there is also the ever-present issue of documentation of the pieces which we do have.  Some of the code in twisted.mail might have been useful to you if its documentation had been better.  For example, you might also notice some pretty strong similarities between twisted.mail.protocols.DomainDeliveryBase.receivedHeader and your own implementation of that method.

My main point here is that fixing this is a simple matter of programming (or, in the latter case, of documenting).  I think that the best way to deal with that shortcoming is simply to submit patches to twisted.mail which add the functionality that you want.  Lots of open source projects are like this: they were driven just far enough to satisfy their implementors' use-cases.  twisted.mail is a perfectly functional and simple API if you want to build what it is designed to build.

When we're talking about "Twisted", we're typically talking about the core, and the programming model that comes with it.  When you get into the specifics of an API like twisted.mail, twisted.names, and even twisted.web (maybe even especially twisted.web) you're going to find plenty of shortcomings and areas that it don't yet do what you need.  There are some areas which are downright bad, and some which are so bad that they're embarrassing.  We need volunteers to identify the areas that are lacking and add to them.

Twisted vs. Things Which Are Not Twisted

The reason that I disagree with your conclusion that Twisted as a whole is necessarily more complex, hard to explain, too dense, unreadable (etc, etc) is that the main thing to compare it to is shared-state multithreaded socket servers, or asyncore.

Here's a good example of what makes Twisted simple, at its core:
from twisted.internet.protocol import Protocol
class Echo(Protocol):
  def dataReceived(self, data):
    self.transport.write(data)

This server supports a large number of clients.  It supports TLS.  It's cross-platform.  It supports both inbound and outbound connections.  And yet, including the import, it's only 4 lines of code.  You can write a threaded version of this which appears to be just as short, but it's pretty much impossible to do without getting a half-dozen subtleties of either a socket API or a concurrency issue wrong.

For example, your example "smtp_helper.py".  You don't provide any documentation of its concurrency properties, but the implementation of 'start' is almost certainly wrong.  For one thing, starting the same TestSMTPServer twice, or even starting two completely different TestSMTPServers at the same time, will not work.  Of course, you'd never do that, but let's say your SMTP client also used asyncore and a thread.  Now you've got a client using socket_map in the main thread and a server using socket_map in another thread.  Also, there's the fact that process_message may be called from an arbitrary thread; if it ever grew to do anything more complex than appending to a list, it would need its own serialization logic.  This isn't something that could be fixed — the entire approach is wrong, and you would need to rewrite all of your tests to work completely differently in order to fix it.  You'd need to asynchronously start both your client and your server, then have an API for letting your tests know when both of them are done.  By the time you're doing that, you're practically implementing your own mini-Twisted, along with extensions to unittest that turn it into Trial.

Ironically, you can use Twisted to fix this problem.  If you really like the API presented by the 'smtpd' module, you could write a wrapper which would make an asyncore dispatcher look like a Twisted protocol factory (or protocol), and hook asyncore into the main loop, then use 'trial' for your testing.  How exactly one would implement such a thing is beyond the scope of this post, but it's not actually that hard; just look at the relatively few methods that asyncore.dispatcher calls on self.socket and you'll probably get the idea.

I feel that the comparison of "Twisted" versus "non-Twisted" code you've presented is a bit unfair.  The Twisted example is a demonstration of utility functionality that Twisted Mail is missing, not a core idea that Twisted implements wrong.  The code it is being compared to looks simple only because critical areas of correctness that would need to be addressed in a real system (and will probably eventually need to be addressed, if the test is maintained for a long time) are being completely ignored.  The twisted example, if it fails, will fail relatively straightforwardly; the other example's failure mode will be an obscure traceback coming out of otherwise unrelated (but not thread-safe) code.

However, your subjective experience of some areas of Twisted being hard to understand and use is entirely valid.  Your detailed description of why it was difficult for you has already been useful, but I hope you will stick around and help us improve the situation for future users as well.

Trial and Testing

Perhaps the more significant issue that you discovered while you were working on this is the subtle mystery of getting Twisted to fully shut down a connection and a bound port inside a test.  This is really way too hard, and it is a problem which affects anyone who wants to use Trial for integration testing.

Although I'd really like to see this problem dealt with in a systematic way, and I'd like it to be easy as pie to write integration tests with trial, there is a reason that the issue hasn't been fixed.  As the Twisted team has been improving our testing skills, we've been finding more and more that you absolutely need good unit tests before you can really write integration tests.  Without unit tests, you don't know whether the individual pieces work, so they tend to break in surprising ways when you put them together.  In Twisted itself we are still in the process of rehabilitating a very large, and very old hodgepodge of unit, functional, and integration tests to be broken down into smaller, more coherent unit tests.  Until that process is finished, and trial has been tuned to be as good as possible for that sort of testing, integration testing isn't going to be a focus of any core developer.

I agree with the advice that you were given on IRC.  We could eliminate the particular surprise of doing a clean connection shut-down in trial, and provide a good way to do it, but you'd still face issues with your tests where the SMTP API might be scheduling timed calls or doing other things behind your back which would be difficult to monitor or shut down.  Talking to a mock message-sending implementation for starters would be a lot easier.

I can understand your concern about passing more parameters.  Luckily, this is Python: you don't necessarily need to change the interface of the system you're testing.  If you have a system, A, that depends on another system, B, to perform some of its work, you need to have a reference from A to B somewhere.  That can be passed as a parameter, imported as an object, or loaded as a module.  In Java, you'd need to change all your type declarations and do some kind of dependency injection magic, but in Python you can always cheat.  The worst case in Python, after all, is that A imports B as a module.  So, if you don't want to add any parameters, or even any attributes or methods, consider this:

# A.py
import B

def stuff():
  B.functionFromB().otherStuff()

# test_A.py
import unittest
import A
import B

class MyTest(unittest.TestCase):
  def functionFromB(self):
    result = B.functionFromB()
    # Modify the result for the test, if you like
    return result

  def setUp(self):
    A.B = self
  def tearDown(self):
    A.B = B


Some might consider this a bit gross, of course.  It might be cleaner to add a specific API for plugging in a different implementation of B.  However, it's useful to use this technique in cases — such as the one you described in your post — where you are trying to add some test coverage for an API which has already been written and you don't have control over.

I hope that digression helped, but I don't want to turn this into a screed about what you could have done better; let's consider your requirements as fixed (this needs to be an integration test) and look at what Twisted could have done better.

One thing the core team has been talking a lot about lately has been the development of verified test doubles.  We don't have a lot of them, and we need more.  For example, if you could pass a fake reactor to both your SMTP sender and receiver code, then you could manually make sure it was sending traffic at the appropriate times, to the appropriate hosts, and fail your test in sensible ways if it did something unexpected, rather than just having trial bomb out on you.  This would also let you have regression tests to make sure that your code was working with the latest version of Twisted, in case the APIs in question changed.  You wouldn't need your test to have a full, complete, clean shutdown of your SMTP connections because they would simply be garbage collected, as they would not be connected to the real reactor.  You can see an example of what this might look like in twisted.internet.task.Clock.  If someone contributed a real, documented, usable, verified test double for IReactorTCP, we would all be eternally grateful, especially if they could coalesce all the uses of the numerous half-assed attempts at it in our own test suite.

Something else we could do is write a supported factory wrapper which would allow the use of a real factory and connection in a trial test, but that would shut everything down cleanly at the connection level in tearDown.  I would personally like this a lot, but I can't promise that it would be popular with the rest of the Twisted team.  We all spend a lot of time trying to convince people to write unit tests before integration tests.  I know that I'm a little concerned that providing great integration testing support will just lead to more people being confused by weird interactions in the guts of whatever protocol they're talking to.  Eventually, however, integration tests can be useful, and I wrote the beginnings of the wrapper that I'm suggesting when I was writing tests for the AMP protocol.  You might be able to use that as an example even if Twisted doesn't provide any public APIs for that sort of thing.

Conclusions

Unfortunately there's not much I can do immediately to fix the problems that you've had, Lakin.  If someone with a similar level of Twisted experience attempts a similar task in the near future, it's likely that they'll hit the same issues.  I barely (read: didn't actually) have the time to write this blog post, and I definitely don't have the time to fix the problems I've outlined.

While there are definitely some problems here, I don't think the situation is really all that bad.  According to your post, learning enough about Twisted to do what you were doing and writing the Twisted version of this code took only 3 days.  This learning curve is not as steep as some have accused Twisted of having.  Presumably it would have taken someone already familiar with twisted.mail and trial much less time.  It didn't take me much more than 2 minutes to read and understand it :-).  As I mentioned above, your friend's threaded smtpd implementation has some pretty severe problems which might cause maintenance headaches later, whereas you were quite careful to do a proper shutdown (the trickiest thing to get right) in the Twisted version, so it is likely to be fairly robust going forward.