Structured Python Editor

Thursday April 10, 2008
If you're not already familiar with Subtext, you should probably watch this video to learn about it.

I'm not a big fan of the Subtext programming model. But it does convey one idea that I really, really like. Programs are structured data. This is a very powerful idea, and I think it's a pity that it hasn't really caught on.

Most programmers subscribe to the idea of model/view separation; this phrase has especially come back into vogue with the popularity of systems like Django and Rails. But programmers are only a fan of this as far as it comes to "end-user" applications. For our own tools, rigidly gluing the model to the view (and both the model and the view to the persistence format, execution model, and innumerable other details) is the order of the day. Indeed, Python's own popularity is due in large part to the relative beauty of its syntax.

One of the problems this causes is a language gap. Guido mentioned this in his Py3K talk: different programming communities are already choosing identifiers based on their natural languages. The conflation of real-names and binding-names also creates a more subtle problem in Python: when you want to deprecate a name, let's say, twisted.web.server, you have to choose another name — probably one which isn't as good. If the binding name were, as in Subtext, an internal identifier rather than the user interface accessible to everyone, this would be an easier thing to do. For that matter, a large part of the Py3K effort itself is a change to Python's user interface; if Python were an interactive program with a separated model and view, it would be much easier to change this without changing everyone's code at the same time.

IDEs like PyDev for Eclipse and Wing IDE don't really address the problem of "program as bag of bytes". They provide tools, it's true, but those tools still treat a program as semi-structured information. One of the things you do most frequently in an IDE like this is type some code, which at least temporarily puts your program into a totally invalid state. As you're typing "def ", your module is syntactically invalid. Once you've finished adding arguments, and a docstring or method body it's valid again, but only until you make your next change. If you're using a tool to edit something other than a program, like, say, Inkscape, as you move between different states in your drawing (add a line, change a gradient, resize a shape) each one is a valid SVG document if you were to save it. This is one of the reasons that I don't really use IDEs; despite their features, the core of the experience is still hammering away on a bunch of text files, and for that, it is very hard to beat Emacs.

IDEs aren't the only tools which could benefit from a truly structured interpretation of code. Version control systems, for example, could benefit immensely by having higher level operations. How often do you really want to know "who changed this line of code last"? I don't know about you, but personally, I want to ask questions like "when was this method defined" and "was it ever moved from another module". These questions are difficult or impossible to ask of modern version control systems (even the really good ones).

Another issue with programs being effectively unstructured is that they're not discoverable. If you want to draw a line in Inkscape, you don't need to look up the SVG syntax for drawing a line; you can just hunt around for the "line" button. This is especially important for students, who frequently forget basic things like "how do you define a method" or "what does it mean when a function is outside a class" while learning. Squeak addresses this problem, somewhat: there's still a lot of text floating around, but your program itself is a bunch of objects you can look at.

One of my perpetual second week projects is to make an IDE that understands Python as the serialization format of a graph of objects — modules, classes, and functions — rather than text. This could work on existing Python programs, and it wouldn't need to introduce any wacky new programming paradigms in order to do it: simply treat Python as a runtime and a serialization format, and parse/serialize Python code as if it were any other type of data, like Inkscape does for SVG. Since I'm never realistically going to do it, does anyone else want to? Has somebody else done it already?