A collection of articles, ideas, and rambling from a guy who wrote some software that one time.

Wednesday, February 22, 2006

Push F9 To Continue

F9 is the default keybinding for "run the unit tests for the current file" when using the Twisted Emacs bindings.

As her Windows/Divmod tutorial indicates, Ying is getting started with coding using Twisted again. This is the third time that she's gotten started with it - tries #1 and #2 involved Woven and Nevow, respectively. I was cringing as she started to get her development environment set up, because those previous attempts were shockingly painful. Every step of the way was a challenge - version skew. Path problems. Obscure deprecation warnings. Un-debuggable template interactions. Undocumented assumptions. DOCTYPE declarations.

This time she's got a bit more of a domain-model problem to attack. It was staggering to me how different it was. While getting started with a web-based development project with Twisted was painful, and it wasn't necessarily clear how to proceed, getting started with a domain model was trivial. With a 4-line unit test template, she was productive within 5 minutes. Of course, having yours truly in the room (and emotionally dependent upon you) when you start to work on a Twisted program helps, but I have been largely uninvolved - as opposed to previous attempts where she would ask a question every 5 minutes and I would start the answer with, "You have to understand the history of the multiple projects involved here..." or "There are some unresolved issues...", now she asks a question once per day or so, and the answers are short, direct sentences like "use twisted.web.client.getPage" or "return a Deferred from your test method".

I have long despaired of Twisted development being easy to newcomers, and the community (both friendly and hostile to Twisted) reinforces this assumption. "Asynchronous programming is too hard", "learning Twisted is a serious investment of time", "there's so much you have to know to get started", etc. However, I've now seen that it can be easy. We just need more tutorials that blitz through the introductory steps on a particular platform without explaining anything, and get straight to coding something.

For example, if I had written the tutorial that Ying posted to her blog, I probably would have explained each step, so as to give users maximal flexibility in their setup. "Make a folder to hold your projects. We'll call this the 'combinator container' folder from now on. You can place this anywhere on your sys.path. Now, make a folder /path/to/combinator-container/Divmod ..." Ying chose a much more direct route. Nobody really cares what the folder is named, they just want things to work. "Make a folder C:\Projects. Now make a folder inside that called Divmod. Then run 'svn co ...'"

Once you have gotten to the point where you are hitting F9 every five minutes, watching your code run, and fixing problems, you don't really care that you don't know how to put C:\Projects at H:\Documents and Settings\%USER%\My Documents\Programming\MyNiftyProject\Infrastructure. You don't care that you had to run svn command lines rather than installers. You certainly don't care why version 0.9.8a of OpenSSL is required or where ZopeInterface was installed.

That last step - where you just push a button, rather than starting up a terminal and typing a command line - is an important one. It makes the experience feel complete, and it removes a point in the development process (that happens every 5 minutes or so) where a new Twisted user thinks, "this is a pain in the ass, these tools are terrible".

twisted-dev.el is the best-kept secret of Twisted developers. It needs to be more front-and-center. We should eschew the bits that never really worked, and are no longer maintained, i.e. the PB/Emacs integration, and include the core one-button-unit-test functionality with Twisted itself.

Friday, February 17, 2006

The Opposite Test

Whether or not I've used macs, I've always been a big Guy Kawasaki fanboy. Getting someone with such integrity and seemingly boundless enthusiasm to evangelize for them was one of the best things Apple ever did.

Since I am going to be "evangelizing" quite a bit while in Austin, I am thrilled to see that recently Mr. Kawasaki started blogging, sharing his wisdom about evangelizing, and he's saying things that I already agree with.

He's a big fan of top ten lists, and everything I've read so far I agree with. One thing stuck out for me in particular though, because I just said something similar about open source project descriptions:
Apply the opposite test. How many times have you read a product description like this? “Our software is scalable, secure, easy-to-use, and fast?” Companies use these adjectives as if no other company claims its product is scalable, secure, easy-to-use, and fast. See if your competition uses the antonyms of the adjectives that you use to describe your product. If it doesn't, your description is useless. For example, I've never seen a company say that its product was limited, full of leaks, hard-to-use, and slow.
I wouldn't mind so much if people wrote such descriptions and then moved to substantiate them. Sometimes it's really important to have software that is scalable, easy-to-use, and fast. Sometimes you really do want a fast, clean, dark theme.

Presumably, in such a situation, your users know how fast, or how clean, or how dark they want the theme to be - how many users it has to scale to or what their training costs are going to be. Talk about that. Measure it, and write your advertising like a thesis you are going to have to defend. If you're writing such literature, even if sales isn't your job, you are in the role of a salesman and your readers know it, even if you don't. That means they are going to assume that every single word you say is a lie. Provide examples, show screenshots, compare to other things that they might be familiar with. Try to avoid graphs without meaningful numbers and units - for example, don't do this. Apple's Intel Core Duo site includes an "application performance" graph that has bars that say "4.1x faster", but don't explain the benchmarking very well, and while there are 4 tests there is only one "baseline" bar. (Also, they do say that they used a beta of the Cinebench software, which means that the results aren't even going to be comparable to something that customers looking at this site will have access to to run themselves on their own hardware, even if they were available, which they aren't. But I digress.)

I'm picking on apple because I'm considering maybe buying a MacBook this year, but the open source world has even more to learn about this than corporate marketroids. Every open source database project claims to be efficient, but how efficient? At least Oracle provides benchmarks during sales pitches. Let's say that I am going to build a system where database efficiency is really important - what is the most efficient open source database? Even the Open Source Database Benchmark site doesn't list results - their Project Status page (which is admittedly ancient, but they are still the first google hit for "open source database benchmarks") is only detailed enough to show that certain databases work with the benchmark, if you want to run it yourself.

When you're describing your open source project, think about your users, not about you. I think that the temptation to say software is "efficient" or "scalable" comes from the fact that programmers have to spend time doing optimizations and thinking about scalability. Even if it takes the bulk of your time, that's a base-level requirement, not something that's going to make your project better than its competition. Sure, if you walk up to a database user and ask them, "What would influence your choice of a database on future projects?" they might say something about efficiency, but if you think about what a database user is going to do with it, how they are going to experience the utility of the database both during development and on a running service, they are not going to be benchmarking and tuning constantly. They are going to be debugging problems with the database, when they inevitably get something wrong. People don't like to think about themselves making mistakes, or the database failing, but I think you will find people responding more positively to a database that provides gobs of useful information about what's going on than a database that is 8% faster than its nearest competitor.

While it's not perfect, I am a big fan of SQLite, and I think that (in addition to having an excellent technology) the "marketing message" on the website is very good. It begins by describing the database as "small, zero-configuration, self-contained", which is more interesting than the performance characteristics - to most users. I happened to be concerned about both issues, and despite being out of date, they provide a long, detailed page on database performance which clearly indicates that it is not slow.

Since I have been thinking about these sorts of issues, I am starting to formulate a plan for Twisted's marketing and future directions, too, but that's enough blogging for one night. Watch this space...

Wednesday, February 15, 2006

My Girl Loves Me

Best valentine's day present ever:

Twisted / Divmod development on Microsoft Windows: environment configuration HOWTO.

If you want to work with Divmod's code on Windows, Ying's recent blog entry on the topic is probably the best thing you can read on the 'net right now. I don't know if she's going to keep it updated, so maybe someone should copy it onto the wiki...

Saturday, February 11, 2006

Python Logitech G15 Keyboard Multiplexing Daemon

Over the weekend, I discovered that there are drivers for my keyboard for linux. My keyboard has a small programmable LCD, which I had, until now, been unable to hack in Linux.

Unfortunately this will only be interesting to you if you buy one of these keyboards, but this morning, it only took me an hour or two to put together a Python Logitech G15 keyboard daemon, which replicates most of the functionality from the included Windows drivers. It also provides a really simple Python API for hacking the display.

% cd g15lcd-1.2-pre0
% tar xvjf .../pyg15.tar.bz2
% python run.py


To try it, you will need both Python and Twisted.

Sunday, February 05, 2006

Block Syntax for Python

If you can't have the Python syntax you love, love the Python syntax you have

So, apparently, someone proposed this in #twisted yesterday:

def foo():
@x.doSomethingDeferred().addCallback
def something(result):
return result.stuff()
@something.addErrback
def somethingElse(f):
err = f.trap(FooError)
handle(err)
return retry()
return something


I wasn't there to hear it, and I am told that they were shouted down because it is so obviously a horrible idea.

I don't think that it's a horrible idea. I think it is awesome.

It had never before occurred to me that decorators could be used to implement what is almost (but not quite) block syntax for Python. Since block syntax is my all-time highest priority for Python syntax, the fact that this moves things one step closer makes me happy.

In fact, I'd go so far as to say that I'd like to propose that Deferreds gain a few extra features so that you can spell it in a slightly more expressive way:

def foo():
@x.doSomethingDeferred()
def something(result):
return result.stuff()
@something.Except(FooError)
def somethingElse(err):
handle(err)
return retry()
return something


This idiom effectively turns @-at-function-scope into a symbol meaning "do something asynchronous". Hooray! The only reaction I've heard to this so far is JP and Itamar, both calling for my immediate assassination. I'm sure that will be a popular sentiment among Twisted developers. Anyone else who doesn't think I should be killed for echoing this proposal, though?


Update: The first example I posted is a syntax error - due to a misfeature of decorators that, ironically enough, another Twisted developer objected to when it was introduced. As this is a horrible language abuse to improve readability rather than to some other end, I'm not sure I could seriously argue that the idiom should be @apply(lambda: x.doSomethingDeferred().addCallback). I suppose I'll have to go back to getting some real work done today instead...

Saturday, February 04, 2006

No PyCon for Divmod this year :-(

Divmod won't be going to PyCon this year.

Unfortunately, even a small conference like PyCon can cost too much when you are an un-funded startup. When we evaluated whether or not it was a worthwhile idea to go, we just couldn't make it add up - by the time PyCon rolls around, we are going to need to be doing a lot of deployment, marketing, and sales work, and PyCon is more like a vacation than a sales venue for us. In previous years we've been able to rationalize the cost because we were mostly on the east coast, and all in the US, so it was generally a cheap trip. This year we have an overseas developer and the conference is on average farther away from all of us.

I also wanted to say something publicly because this isn't a comment on the conference - I'm definitely going to miss going, and I am sure it will be a lot of fun. I apologize to anyone who thought they were going to see us there. We were planning to go before we re-evaluated our priorities. I hope we will be there again next year.

Ironically I will be on a business trip to Texas anyway at around the same time as PyCon - in Austin, which is to say, the part of Texas that isn't totally awful. (Seriously, I know the conference had to move because it was outgrowing the venue, but Dallas? I'm still confused about that.) I hope to post something about the super secret projects I am going there to discuss, but don't hold your breath - they are super secret after all.

Wednesday, February 01, 2006

Slaying Medusa

That is not dead, which can eternal lie
and in strange aeons, even death may die
- HPL


Apparently some guy is trying to resurrect Medusa. I wasn't going to say anything until he mentioned Twisted directly, calling it "unprofitable", and "a complex maze of orthogonal but impractical interfaces", but IT'S ON NOW.

Medusa was great for its time. I read the whole thing before I started work on Twisted. In fact, Medusa can be credited with opening my eyes to the problems with threads. Somewhere, locked in a lead drum, encased in concrete, the very first version of Twisted Reality for Python can be found, using threads for concurrency and pickle for serialization.

However, even when I was starting with Twisted - circa 1999 - Medusa was already showing its age.

The main problem with Medusa is that it does too little. For example, Windows sockets are different from UNIX sockets in a variety of ways you probably aren't going to anticipate if you are calling self.recv and such in an asyncore.dispatcher. Those are just the differences in sockets - Twisted handles threads, pipes, and processes too, which a cursory look at the new "Allegra" reveals only rudimentary support for.

95% of the time, you won't notice these problems. As they're related to operating-system error reporting, they tend to show up only when your program is under load. In other words, you won't notice any issues with your program until a lot of people are using it and there is a huge amount of pressure on you to fix it immediately. Luckily Allegra has no automated tests, so you won't have to be troubled by finding out about bugs too early. Twisted has more than 2500 tests run on 4 different platforms with each commit.

Don't believe me that there will be issues with your super-simple cross-platform select loop? By way of an example, after years of maintaining their own networking core, BitTorrent has started using Twisted, with the unobtrusive changelog message of "TCP Stack flaking out bug fixed, using Twisted".

Even given the testing situation, Medusa (I'm sorry, "Allegra") is, indeed, quite a bit simpler than the full Twisted suite, and an adequate solution to 95% of the asynchronous-socket-programming problems out there. However, it's only adequate, whereas I think Twisted is excellent.

As per my last post, there is a lot of complexity in some projects surrounding Twisted, and some of it is unnecessary. That complexity isn't there if you are just trying to get basic asynchronous I/O going, though. If you are considering Medu^HAllegra for some reason, have a look here first: Writing Servers with Twisted. If you can't figure that out, I've just saved you some time: you should just stop trying to write networked programs. "a complex maze", "impractical"??? The simplest working server is just 4 lines of code: an import, a class statement, a method definition, and a method body. You can really be productive that fast. I don't think it's even possible to be simpler than that in a Python framework.

A similar document for Medusa, Asynchronous Socket Programming, has more than twice as many words, and the simplest example is more than 4x longer.

As to "unprofitable", well, let me just direct you to a page that describes people's profitable experiences with Twisted: including the quote "The quality of the Twisted networking core is unmatched in the open- or closed-source arenas".

I'm glad to see someone doing a project that aims to do something similar to Twisted, since that indicates it's a thing that people need doing. I'd love to push for some interoperability standards if a different framework were to emerge that had different implementation strengths than Twisted does (and goodness gracious, it could certainly be faster. Twisted can only handle a few tens of thousands of requests per second on high-end hardware, C++ servers are in the millions these days).

However, there are a few things such a framework would have to get right first:
  1. Write some automated tests. Seriously, it's 2006 already, unit tests are an accepted best practice pretty much anywhere that people care about writing programs that work even some of the time.
  2. Be clear about what you're trying to do differently, or don't say anything at all. Here's a hint: any phrase involving the word "lightweight" is not specific. Reading the descriptions of Python frameworks (although this mainly applies to tiny one-off web frameworks) I am reminded of the descriptions of WindowMaker themes at the beginning of the themes craze five years or so ago. "It's a lightweight, simple, fast, clean, dark theme, with a picture of Neo on the desktop" probably described 4 out of every 5 themes to hit the site. The first four adjectives regularly show up in descriptions of frameworks. Where were all the heavy, slow, dirty, frameworks that these are in response to? If these are valid design tradeoffs, why don't people ever advertise as the opposite? Those four adjectives are basically meaningless - when an author of some software says it's "fast" and "lightweight" they just mean "I know what it does, so it doesn't surprise me when it's slow, and I can get things done quickly with it, because I wrote it." If it were measurably fast or small they would be quoting statistics at you about how many requests per second it can perform or how small the image can be compressed.
  3. Have an application. It looks like Mr. Allegra is, in fact, driving the requirements for his new-old framework from an application, and that's good. If his approach doesn't work well, he'll find that out, rather than assuming that it is fine.
  4. Finally, specific to asynchronous I/O frameworks: separate your transport from your protocols. Don't force your users to fetch data themselves from the OS; there is just no reason for that. It is a real pain to implement a low-level enough API in Twisted for compatibility, if your users expect to diddle the sockets themselves.