A collection of articles, ideas, and rambling from a guy who wrote some software that one time.

Saturday, September 12, 2009

What I Wish Tornado Were

FriendFeed has released its web server, Tornado.  It seems like everyone's blogging about it, and it's obviously relevant to my interests, so I feel like I should say something.

Let me start with the good stuff.  First of all, I think it's great that we have yet another asynchronous contender in the Python world.  Every time something like this comes out, it means that Twisted has to fight that much less hard to get over the huge hump of event-driven programming being too hard, or too weird, or whatever.  It's good to have an endorsement of the general message "if you need a web server to handle COMET requests, it needs to be asynchronous to perform acceptably" from such a high-profile company as Facebook.

Unfortunately I think the larger picture here is a failure of communication in the open source community.  In the course of developing Tornado, there are several things that FriendFeed could have done to move the Twisted community forward, at no cost to themselves.  I don't want to rag on FriendFeed, or Bret Taylor, or Facebook here; they're not the first to re-write something without communicating.  In fact I recently had almost this exact same discussion with another project that did the same thing.  Since Tornado is such a high-profile example, though, I want to draw attention to the problem so that there's some hope that maybe the next project won't forget to communicate first.

My main point here is that if you're about to undergo a re-write of a major project because it didn't meet some requirements that you had, please tell the project that you are rewriting what you are doing.  In the best case scenario, someone involved with that project will say, "Oh, you've misunderstood the documentation, actually it does do that".  In the worst case, you go ahead with your rewrite anyway, but there is some hope that you might be able to cooperate in the future, as the project gradually evolves to meet your requirements.  Somewhere in the middle, you might be able to contribute a few small fixes rather than re-implementing the whole thing and maintaining it yourself.

This is especially important if you are later going to make claims about that project not living up to your vaguely-described requirements, and thereby damage its reputation.  Bret Taylor claims in his blog:

We ended up writing our own web server and framework after looking at existing servers and tools like Twisted because none matched both our performance requirements and our ease-of-use requirements.

First and foremost, it would have been great to hear from Bret when he started off using Twisted about any performance problems or ease-of-use problems.  I'm guessing that Twisted itself had only ease-of-use problems, and other "tools like Twisted" were the ones with performance problems, since later, in a comment on the same post, he says:

I can't imagine there is much of a performance difference [between Twisted Web and Tornado].  The bottom is not that complex in my opinion.

It would also be great if he had explicitly said that Twisted didn't have performance problems rather than making me guess, because I'm sure that is what lots of developers will take away from this.  When you have the bully pulpit, off-the-cuff comments like this can do serious damage to smaller projects.

More to the point, what is the problem with "ease of use", exactly?  The fact that he found Deferred tedious, in particular, seems very strange to me, given that it is so un-tedious that it has become a de-facto standard even in the JavaScript community.  We had no opportunity to help him or anyone else out, because as far as I can tell from searching our archives, we never heard from him or from anyone else at FriendFeed when they were trying out Twisted at first.  Even as he's saying that Twisted is hard to use and (maybe?) performs poorly, he isn't pointing to any particular example of what about it is hard to use, or what performs poorly.  There's still nothing we can do to address this criticism.  And there's still not much we can do to make sure that future potential Twisted users won't have this problem.

Later, in yet another comment, Bret points out the root problem:

... the HTTP/web support in Twisted is very chaotic (see http://twistedmatrix.com/trac/wiki/WebDevelopme... - even they acknowledge this)...

This is true.  However, as I frequently like to note, Twisted is starved for resources.  Reconciling the chaos described on the page about web development with Twisted is an ongoing process.  For a tiny fraction of the effort invested in Tornado, FriendFeed could have worked with us to resolve many of the issues creating that chaos.

This is the main thing I want to reinforce here.  If half a dozen occasional contributors with a real focused interest in web development showed up to help us on Twisted, we'd have an awesome, polished web story within a few months.  If even one person really took responsibility for twisted.web, things would pick up.  But if everyone who wants an asynchronous webserver either uses twisted.web (because it's great!) without talking to us or decides not to use it (because it doesn't meet their unstated requirements) without talking to us, it's going to continue to improve at the same sluggish pace.

Even at the current rate, by the time we have an excellent HTTP story, I somehow doubt that Tornado will have a good SSHv2 protocol story ;-).

In his comment, Bret also takes a couple of pot-shots at Twisted that I think are unnecessary, and I'd like to address those too.

In general, it seems like Twisted is full of demo-quality stuff, but most of the protocols have tons of bugs.

We're not talking about "most" of the protocols here, Tornado is only concerned with HTTP.  And the HTTP implementation(s) in Twisted do not have "tons of bugs".  They are production quality, used on lots of different websites, and have lots of automated tests.  While much of the code in twisted.web doesn't have complete test coverage, since it's old enough to predate our testing requirements, I note that Tornado appears to have zero test coverage.

There's a kernel of truth here — some of the older, less frequently used protocols have a few problems — but in most cases the "bugs" are really just a lack of functionality.  Twisted overall has very few protocol-related bugs, and again, our test policy makes sure that we have get new bugs very rarely.

Given all those factors, it didn't seem to provide a lot of value. Our core I/O loop is actually pretty small and simple, and I think resulted in fewer bugs than would have come up if we had used Twisted.

I must respectfully disagree.  Again, I don't want to rag on FriendFeed here, but here are several features that Tornado would have, and bugs that it wouldn't have, if it used Twisted for the event loop and none of the HTTP stuff:
  1. EINTR wouldn't cause your application to exit if run in a non-US-english locale.
  2. You don't have the opportunity to forget to set a socket to be non-blocking and thereby make your entire application stop.
  3. It would be possible to run your application on Windows.
  4. Firewalled connections and running out of file descriptors wouldn't cause your server to spew errors forever (at least, it won't any more).
  5. You could write a TCP client that didn't block for an arbitrary amount of time in connect().
  6. Finally, of course, you could use all of Twisted's other protocols, client and server: IMAP, POP, SMTP, IRC, AIM, etc.  You could also use external protocol implementations like Thift.
  7. You could spawn asynchronous subprocesses.
and this is a very short list, based on a cursory reading of the source code, not actually running tornado and not a particularly deep audit.  Some of these bugs might not be as serious as I think, and there might be plenty of other bugs.  But I can't really be sure what works for sure, since again: there are no automated tests.

This list is a great example of why projects like Tornado really should use Twisted.  Tornado implements some innovative web-framework stuff, but absolutely nothing interesting that I can see at the level of async I/O.  Using Twisted would have allowed them to focus exclusively on cool web things and left the never-ending stream of incremental surprising platform-specific, only-happens-in-weird-situations bugfixes to a single, common source.

What To Do Now

I hope that someone at FriendFeed will be a little heavier on detail and a little lighter on FUD in some future conversation about Twisted.  However, I'm sure they're going to have their hands full maintaining their own code, so I don't have high expectations in this area.  I'm sure Bret wasn't intentionally slamming Twisted, either; it wasn't like he wrote a big screed about it, he just dropped in a few unsubstantiated comments into a much larger post about Tornado. So I just want to be clear: I don't have sore feelings, I don't need anybody to apologize to me or to Twisted.

If any of you out there are fans of both Tornado and of Twisted, it would be great if you could contribute a patch to Tornado which would allow it to at least optionally use Twisted as an I/O back-end.  It would be great, of course, if lots of people interested in web stuff would help us out with our web situation, but supporting the Twisted event loop would be good regardless. It would mean that when people wanted to speak multiple protocols, they wouldn't need to re-write or kludge in their existing Tornado application, so it would increase the chances that we could get some help with our SSH, FTP, IRC, or XMPP code instead.  It would also open up a much wider multi-protocol landscape to users of Tornado, even if Tornado's default mode of operation still used ioloop.py.

Even better would be to hook up something that made a Tornado IResource implementation, so that Tornado applications and twisted.web and Nevow applications could all be seamlessly integrated into one server.

The whole point of Twisted is to have a common I/O layer that lots of different libraries can use, share, and build on, so that we can solidify the common and highly complex abstraction required of a comprehensive, cross-platform, event-driven I/O layer.  In order to realize that vision, we need help not just with the code; we need more Twisted ambassadors to go out into the community and help us integrate these disparate applications, help us find out where real users are finding the documentation inadequate or the organization confusing.

Tornado could be an excellent opportunity for those ambassadors to go out and introduce others to the wonders of Twisted, because its endorsement from FriendFeed guarantees it an audience of a tens of thousands of developers, at least for its first few months of life.  If you've shied away from contributing to Twisted itself because of our aggressive testing and documentation requirements, well, Tornado apparently doesn't have any, so it would be a great place for you to start :).

23 comments:

Michał Pasternak said...

Twisted is hard to learn? I don't think so -- I'd ignore "steep learning curve" argument totally. It takes some time to understand async programming and deferreds, just as it takes some time for one to understand C pointers.

BUT: it took me a very long time (too long :) to find that Twisted supports a variety of reactors (epoll, kqueue, win32). This is a good point, especially for someone concerned about performance: "why should I roll out my own solution, when I can use platform-specific optimizations already?" and "with Twisted, our software works at maximum pace on Linux and FreeBSD". You should advicate platform-specific stuff somewhere, perhaps on the main web page, it is important.

... and, while we're at the webpage, maybe it would be wise to redesign it a bit (keep the graphics theme, it's cool) and use something else, than Trac, for it? I started using Django about a month ago and I think it's great for creating web pages.

Twisted may seem overkill for simple tasks, Twisted integration with GUI toolkits may be a bit hard to do properly, but the point is, that people are doing crazy stuff with Twisted on servers. To give some examples: Orbited & Morbidq, Chesspark, integration with Django over RPC: http://anirudhsanjeev.org/tutorialhow-to-django-comet-orbited-stomp-morbidq-jsio/ . You should point that out too. Even if some big companies use Twisted, it may be more important for people like me to see what software runs on Twisted.

Michał Pasternak said...

And, just one more thing: I'd like to see how to access SQL database using Tornado _and_ I'd like to see that Tornado framework rewritten in Twisted. People care about web nowadays, people want to start with the web in a simple way. Just like this thing: http://www.sinatrarb.com/ . Maybe it could be done just by adding a few procedures and decorators to nevow.util :-) but it should be done, I think -- just to show ppl new to Twisted, that Twisted _can_ be simple.

Michał Pasternak said...

HA! The web part of Tornado is already ported to Twisted

terrycojones said...

Hi Glyph

I blogged some quite similar reactions: Facebook release Tornado and it’s not based on Twisted?.

Regards,
Terry

Gregg Lind said...

Glyph, I really don't mean to be a prick about this stuff, and I can understand why it might be very sensitive, given the amount of work you've put into Twisted. So please understand this as coming from a place of wanting Twisted to be more awesome.

Twisted rocks: it's huge and robust and tested and scalable, but that's not all that matters.

*The 0-step matters*. Having a trivial Twisted application look trivial *matters*. The "hello,world" at http://www.tornadoweb.org/ is a dozen lines, just like the one in web.py. If twisted can do this (through some sane default web classes, decorators, etc.), then point it out *right on the front page*. Perception matters, and having a prominent snippet, saying "YOU CAN WRITE IN TWISTED, and IT'S SIMPLE, here's how", is more important than some of the posted news.

I felt like when I learned twisted, it was down in the gory details right away. I didn't want to *understand Twisted*, I wanted to *get an asynchronous app going*. That's the 0-step, and the 0-Step matters.

As has been pointed out in the past, a guiding Python philosophy is "simple should be simple, hard should be possible". Twisted gets the "hard" end right. Make the easy end easy!

GL

glyph said...

@Gregg,

Of course I understand that.

The big problem here is that since Twisted serves different audiences, the "0th step" is different depending on what you're trying to do. Therefore Twisted needs to have different top-level web sites ("landing pages", in marketing jargon) that showcase different features. If you show up wanting to write an IRC bot, you don't want a web templating example on the front page; and if we tried to cram everything in there, it would be a cluttered mess, probably worse than it is now.

But this kind of multidisciplinary marketing effort is really just too much work for the current team, and few of us are interested in that kind of stuff anyway. I'm probably among the most interested, but I'd still rather review code and get new features into Twisted, and I think that's a better use of my skill set.

Perhaps you'd like to join the team and help us clean up our introductory experience?

Kevin said...

@glyph - Gregg has a point that I think you are ignoring. Being able to write simple applications in a small amount of code is really valuable. This is an important area where Tornado is superior to Twisted. If you say you aren't interested in that, it's fine, but don't be surprised that FriendFeed decided to build their own rather than use Tornado if that functionality is unimportant to you.

Kyle said...

Twisted has powered or currently powers almost all the core components of Justin.tv, the largest live video website on the internet.

We use it in our chat servers, video servers, and in the web stack. In fact, every single web page served by Justin.tv passes through a caching engine we built using Twisted (open source variant is at http://code.google.com/p/twicecache/).

However, as glyph pointed out - twisted.web isn't in a great state right now (we do not use it very much), but the Twisted event framework is rock solid.

Chris said...

looks like your request for a patch was answered already -
http://dustin.github.com/2009/09/12/tornado.html

glyph said...

@Kevin,

First of all, no, I didn't miss that point. My reaction was based on the fact that it is already possible, most of the time, to get things done in a very small amount of code in Twisted; the problem is that people don't know how to write that small amount of code. c.f. http://twitter.com/glyf/status/3936399244

Second, Twisted vs. Tornado in this regard is apples vs. oranges. Twisted is a web server, tornado is a web framework. In other words, you can't really generate web pages with Twisted. The appropriate comparison here would be Tornado vs. Nevow. In that case, Tornado probably does win in the ease-of-use fight, because bootstrapping Nevow is unfortunately tedious. Once you've gotten the boilerplate to configure a server together, or if you use 'twistd athena-widget', Nevow can be equally brief, so the challenge there is really just to have a deployment option for Nevow at a higher level than "fully integrated web server".

But this post wasn't about Tornado and Nevow. While I think Nevow is great, I can understand lots of reasons why they might not have wanted to use it. This is why Twisted is segregated into layers: twisted core is for networking, twisted web is for web serving, and nevow is for templating. If you don't like a layer you can replace it.

Gregg Lind said...

@Glyph,

All of your points (and market-speak aversion) is quite true! I know you want people to use Twisted for *all* of the glorious tasks that it can be used for (which are many!). Finding the right way to show that simple web frameworks can be built around it is non-trivial, for sure. I'll devote a few cycles to thinking more about it. Also, by "main page" I didn't necessarily mean the absolute front page, more that getting some code examples out front (especially if they're simple) is a goodness. I know it's getting there :) Thanks for the attention!

Folletto Malefico said...

Twisted has always been my first choice for network prototypes, so I'll try to add my .02€ to the comments... maybe it helps. :)

Since I don't have at this time a project in Twisted in my CV (I'm using it only in prototype apps, tests and exercises) I'm unable to speak at this time at protocol-level or at brenchmark-level. I'm quite sure that Twisted is protocol rich, but I don't know about benchmarking.

Suggestion #1: a dedicated, maybe independent, portion on the website built upon a solid and agreed benchmark to test *AND COMPARE* speeds. :)
The first thing that anybody asks when choosing a server/framework is: "Is it fast? How much? And versus XXX how does it compare?". Create a page and automate benchmarks: it will be a great feat.

Communication wise, I add my +1 to some effort in redesigning the website to target different audience. While it's true that if you want to build an IRC bot you don't want an HTTP example in home, you can still communicate both of them.

Also, if anybody gets that "it's simple to do HTTP", looking at the code, it's an easy jump to any other protocol, since anybody will just assume that "it will be the same".

Suggestion #2: work on communication. Also, avoid Trac, it limits too much your website choices. (disclaimer: I hate the usability of Trac, so I might be biased here).

Last point: Twisted needs a bit more ease of use. I'm unable to work full time on many projects at once, but I've got many friends and colleagues that contact me just to revise the code of their API/framework/infrastructure. It's all about what I call "Code Usability", it's a task that require different skills. Some principles are that defaults are critical, characters count and the form must reflect the functionality.

Suggestion #3: work on Code Usability.

~

I already read that some suggestions are too much work for the current team... And I understand that I'm just adding something to comments you've already received.

...but maybe it still helps. :)

I could just add: thank you all. Twisted is a great project, and it truly deserves more.

Vinay Sajip said...

Glyph, you're complaining that someone didn't try to talk to you about some software you wrote, that they were a bit sniffy about? Welcome to the club. You did pretty much the same thing about Python logging (which I wrote) in this post of yours.

In it, after presenting a long wishlist about what a logging system should have, you concluded:

"What Can Do This?

I'm not aware of any logging system that can already do these things. We'd have to write a new one. This essay was largely composed due to my desire to understand what I thought a "good" logging system would do. It might be too ambitious."

You never even mentioned Python's logging package which has been around since Python 2.3 and has many satisfied users, just like Twisted. It's as if it never even existed, or was perhaps beneath your notice?

I commented on that post, saying that IMO Python logging addressed many of your wishes, and ending with:

"If there is a specific scenario where you feel Python logging is not up to the mark, I'd be interested to know more about it - I've asked Blogger to notify me of follow-up comments to this post."

Guess what - I have had no feedback from you.

Do you know the old saw about treating people the way you'd like to be treated yourself? ;-)

Zooko said...

Heh heh heh. I like Twisted and I like Glyph, and I think a lot more people should use Twisted, so please take this in a good spirit: the fact that Glyph never wrote back to Vijay Sanip is not only one of the reasons that the Python logging module isn't better, but it is also one of the reasons that more people don't use Twisted.

Namely, because Twisted comes with its own logging module (twisted.python.log), and that makes it seem more "heavyweight" to new users. ("It should be a library", they say, "Not a framework.". I'm not entirely sure what they mean when they say that, but I suspect that the presence of non-event-I/O-specific features such logging is part of it.)

It also has its own command-line parser (twisted.python.usage), it's own version-number parser (twisted.python.versions), etc.

Now, I understand some of the reasons why the Twisted folks chose to do it this way -- in some cases there *wasn't* an equivalent in the Python standard library or in a third-party package when Twisted invented their own. In most cases, there is some feature or API design that the Twisted people prefer and that the standard or 3rd-party package didn't provide. In almost all cases the Twisted implementation is higher-quality in the sense of not making changes to it without accompanying unit tests.

It would take an awful lot of effort to refactor Twisted to depend instead on the standard logging module, the standard command-line parsing module (or a good 3rd-party command-line parsing module such as my favorite, argparse), etc. It would take even more effort to push your desired improvements and changes upstream into the hands of the maintainers of those other packages.

Nonetheless, if you're going to choose to make your own tools and utilities for reasons such as these, you should be prepared to pay one of the costs -- your project will appear to be heavier-weight to other people, and some of those people will therefore choose not to use it.

Vinay Sajip said...

From Zooko's post, one could perhaps infer that there is a smidgeon of a hint of a Not-Invented-Here mindset somewhere in the Twisted development process. You do get that sometimes with smart people.

Perhaps asynchronous programming is so different that lots of things need to be reinvented to work with that mindset.

Anyhow, it seems like the FriendFeed guys had a look at what was available (mentioning Twisted in particular) and then decided to invent their own. Perhaps not that different an approach from the Twisted one.

From a recent, admittedly non-scientific benchmark by Antonio Cangiano of several webservers including Tornado and Twisted.Web, it looks like there is a prima facie case for the Tornado guys having made the right decision - though perhaps a more studied comparison will not put Tornado so clearly ahead.

PS: I've no real beef with the Twisted guys, and I quite like the software. And not to harp on about logging, but as Zooko mentioned it in his post:

PPS: I'm sure the Python logging module could be improved, and I've always been willing to listen to specific suggestions for improvement, and since it was introduced into Python numerous improvements to code, tests and docs have been added because of user feedback.

PPPS: I just noticed that Tornado uses Python logging :-)

glyph said...

@Vijay,

My issue with the developers of Tornado was that while Twisted already existed, they wrote something new; when announcing the new thing they prominently made some vague and unhelpful criticisms of Twisted which are impossible to address.

Twisted's logging module predates the Python logging module by quite a bit, so if anything you did the same thing, since I don't believe you spoke to us before writing it ;-). Of course Twisted's logging module isn't particularly featureful, so I didn't expect it.

Despite the fact that Twisted came first, my criticisms of the Python logging module were made publicly and specifically, quite a while ago, here:

http://twistedmatrix.com/trac/ticket/307

and here:

http://twistedmatrix.com/trac/wiki/TwistedLogging

It's quite likely that those comments are badly out of date by now. If you'd like to re-open that ticket, it's always possible that we could drop our logging system entirely in favor of Python standard logging :).

Regarding the specific comment that you wrote: reading it more closely now, I realize that it illuminates something I didn't realize about the python logging module: I didn't know it supported structured objects. That's where a huge proportion of my issues with it come from. But the stuff I was talking about in the post was things like tools for developers to ask "what warnings is my code emitting" and users to see a desktop notification when something bad happens. I didn't see anything like that in your comment.

More importantly, I didn't actually write any new code after making that post. Rest assured that if I were going to, I'd take a much deeper look at stdlib logging first.

glyph said...

@Vinjay,

Regarding the performance stuff, specifically...

Jean-Paul Calderone has done a couple of benchmarks of his own that give somewhat different results. I keep bothering him to release them but he hasn't yet.

Also, I don't trust the output of ApacheBench. Although httperf still doesn't output all of its data points so you can graph them, it at least doesn't have strange issues where it stops without completing all of its requests.

Antonio Cangiano said...

Hi Glyph,

I updated my post with the results from httperf.

Vinay Sajip said...

@Glyph,

Thanks for your comment. I wasn't aware of the Trac ticket, as I'm not a heavy user of Twisted.

I've posted some rebuttals on your Trac ticket #307, and reopened it. Looking forward to your feedback.

Nathan said...

@Folletto

I'll echo your #2 comment. I hate Trac. I find it the most unintuitive wiki/bug-reporting-system out there.

But it is slightly better than nothing.

dacresni said...

as for the people who ported the web part of tornado to twisted, that was the right thing to do, its called REFACTORING. If I found a part of a project I wanted but didn't want to deal with an unwieldly codebase, I'd do it to. Twisted's do dam big, I dont want all that. Note how they also glommed Django's template system. Now a document based Database that compiles on MacPPC (im not sure what's my issue with scons on mac) that's what i need.

Alexandre Fiori said...

cyclone is a tornado clone, based on twisted, without twisted web; actually, it's faster than twisted web.
http://github.com/fiorix/cyclone

FoRever_Zambia said...

cyclone is really perfect! we use it in production without any problem! Thanks Alexandre Fiori