If you're not already familiar with Subtext, you should probably watch this video to learn about it.
I'm not a big fan of the Subtext programming model. But it does convey one idea that I really, really like. Programs are structured data. This is a very powerful idea, and I think it's a pity that it hasn't really caught on.
Most programmers subscribe to the idea of model/view separation; this phrase has especially come back into vogue with the popularity of systems like Django and Rails. But programmers are only a fan of this as far as it comes to "end-user" applications. For our own tools, rigidly gluing the model to the view (and both the model and the view to the persistence format, execution model, and innumerable other details) is the order of the day. Indeed, Python's own popularity is due in large part to the relative beauty of its syntax.
One of the problems this causes is a language gap. Guido mentioned this in his Py3K talk: different programming communities are already choosing identifiers based on their natural languages. The conflation of real-names and binding-names also creates a more subtle problem in Python: when you want to deprecate a name, let's say,
IDEs like PyDev for Eclipse and Wing IDE don't really address the problem of "program as bag of bytes". They provide tools, it's true, but those tools still treat a program as semi-structured information. One of the things you do most frequently in an IDE like this is type some code, which at least temporarily puts your program into a totally invalid state. As you're typing "def ", your module is syntactically invalid. Once you've finished adding arguments, and a docstring or method body it's valid again, but only until you make your next change. If you're using a tool to edit something other than a program, like, say, Inkscape, as you move between different states in your drawing (add a line, change a gradient, resize a shape) each one is a valid SVG document if you were to save it. This is one of the reasons that I don't really use IDEs; despite their features, the core of the experience is still hammering away on a bunch of text files, and for that, it is very hard to beat Emacs.
IDEs aren't the only tools which could benefit from a truly structured interpretation of code. Version control systems, for example, could benefit immensely by having higher level operations. How often do you really want to know "who changed this line of code last"? I don't know about you, but personally, I want to ask questions like "when was this method defined" and "was it ever moved from another module". These questions are difficult or impossible to ask of modern version control systems (even the really good ones).
Another issue with programs being effectively unstructured is that they're not discoverable. If you want to draw a line in Inkscape, you don't need to look up the SVG syntax for drawing a line; you can just hunt around for the "line" button. This is especially important for students, who frequently forget basic things like "how do you define a method" or "what does it mean when a function is outside a class" while learning. Squeak addresses this problem, somewhat: there's still a lot of text floating around, but your program itself is a bunch of objects you can look at.
One of my perpetual second week projects is to make an IDE that understands Python as the serialization format of a graph of objects — modules, classes, and functions — rather than text. This could work on existing Python programs, and it wouldn't need to introduce any wacky new programming paradigms in order to do it: simply treat Python as a runtime and a serialization format, and parse/serialize Python code as if it were any other type of data, like Inkscape does for SVG. Since I'm never realistically going to do it, does anyone else want to? Has somebody else done it already?
I'm not a big fan of the Subtext programming model. But it does convey one idea that I really, really like. Programs are structured data. This is a very powerful idea, and I think it's a pity that it hasn't really caught on.
Most programmers subscribe to the idea of model/view separation; this phrase has especially come back into vogue with the popularity of systems like Django and Rails. But programmers are only a fan of this as far as it comes to "end-user" applications. For our own tools, rigidly gluing the model to the view (and both the model and the view to the persistence format, execution model, and innumerable other details) is the order of the day. Indeed, Python's own popularity is due in large part to the relative beauty of its syntax.
One of the problems this causes is a language gap. Guido mentioned this in his Py3K talk: different programming communities are already choosing identifiers based on their natural languages. The conflation of real-names and binding-names also creates a more subtle problem in Python: when you want to deprecate a name, let's say,
twisted.web.server, you have to choose another name — probably one which isn't as good. If the binding name were, as in Subtext, an internal identifier rather than the user interface accessible to everyone, this would be an easier thing to do. For that matter, a large part of the Py3K effort itself is a change to Python's user interface; if Python were an interactive program with a separated model and view, it would be much easier to change this without changing everyone's code at the same time.IDEs like PyDev for Eclipse and Wing IDE don't really address the problem of "program as bag of bytes". They provide tools, it's true, but those tools still treat a program as semi-structured information. One of the things you do most frequently in an IDE like this is type some code, which at least temporarily puts your program into a totally invalid state. As you're typing "def ", your module is syntactically invalid. Once you've finished adding arguments, and a docstring or method body it's valid again, but only until you make your next change. If you're using a tool to edit something other than a program, like, say, Inkscape, as you move between different states in your drawing (add a line, change a gradient, resize a shape) each one is a valid SVG document if you were to save it. This is one of the reasons that I don't really use IDEs; despite their features, the core of the experience is still hammering away on a bunch of text files, and for that, it is very hard to beat Emacs.
IDEs aren't the only tools which could benefit from a truly structured interpretation of code. Version control systems, for example, could benefit immensely by having higher level operations. How often do you really want to know "who changed this line of code last"? I don't know about you, but personally, I want to ask questions like "when was this method defined" and "was it ever moved from another module". These questions are difficult or impossible to ask of modern version control systems (even the really good ones).
Another issue with programs being effectively unstructured is that they're not discoverable. If you want to draw a line in Inkscape, you don't need to look up the SVG syntax for drawing a line; you can just hunt around for the "line" button. This is especially important for students, who frequently forget basic things like "how do you define a method" or "what does it mean when a function is outside a class" while learning. Squeak addresses this problem, somewhat: there's still a lot of text floating around, but your program itself is a bunch of objects you can look at.
One of my perpetual second week projects is to make an IDE that understands Python as the serialization format of a graph of objects — modules, classes, and functions — rather than text. This could work on existing Python programs, and it wouldn't need to introduce any wacky new programming paradigms in order to do it: simply treat Python as a runtime and a serialization format, and parse/serialize Python code as if it were any other type of data, like Inkscape does for SVG. Since I'm never realistically going to do it, does anyone else want to? Has somebody else done it already?
13 comments:
Stuff like this will get a big boost when we gain the ability to create code objects from _modified_ AST objects in plain vanilla cPython. I believe this is coming in 2.6.
There was something like that shown at europython 2007.
Pretty cool stuff. The author (Geoffrey French) also made this cool 3d editor.
http://gsculpt.sourceforge.net/
I can't find a link to his other project though.
Model/view separation is an interesting way to think about it. It highlights one major problem, which is that when people write software for converting user input to a structured format (a parser) they rarely tightly couple that to a renderer of that structured data (an unparser).
Maybe I'll do something about this soon with OMeta. Or I might just wait and implement it for that other project I'm working on.
Oh, also: parsers tend to throw away a lot of data. Some of it you might want to throw away (how much whitespace was used, for example), some of it you definitely want to keep (like comments). Unless you develop an AST with comment nodes, this sort of thing will remain uncomfortable.
I am still working on it.
http://gsym.sourceforge.net
The first prototype of gSym stored the code as a semantic graph (kind of like an AST). Each node/record contains fields for storing values (string or integer literal for example), or links to other nodes. Much like a structure/object in most languages.
The program looked good, and got people interested, but thats about it.
Unfortunately, coming up with a usable way to interact with and edit that structure proved to be very difficult.
The interaction model I used was based on the idea that the user is always making changes to a document, so simply code up these operations, perform them on the appropriate keypresses, etc, and you are good to go.
Problem is, it turns out that there are so many different things that a programmer wants to do, that it would be like O(n**2) programmer effort (where n is number of different types of AST node) to get a usable product.
Not good.
The second prototype stored the code as a more flexible AST, basically something like LISP. Rather than the meaning of the node being determined by the type of object used to store the node, its meaning is determined by the contents/structure of the node; like LISP.
A meta language has been developed which allows you to specify a language (and display rules). These language description documents can also be displayed in gSym.
This makes gSym language agnostic; you can use gSym to edit code for any languages other than Python.
This was a nice point to reach.
Unfortunately, the interaction and editing issue remains.
So I am currently looking at mixing the current interaction model with standard text and parsing techniques. I am hoping that it will yield something more usable.
I would like to say that I will have something cool to show off at Europython 2008. I may not make that date though; may have to wait until PyCon UK.
- Geoff
I'm developing something similar python code editor which represents code in a line tree.
Now, i am able extend commands and identifiers with own properties. Serialize and deserialize this format to XML files. Parse existing python source code even with minor syntax bugs.
Only editing restriction which this format has is preserving block structure of a code.
Project can be found at
http://sourceforge.net/projects/source3ed
This project uses MVC model. Model is persistent tree structure. Controller is Document class which controls all transformations. View is wxWidgets display.
Geoff said -
"Problem is, it turns out that there are so many different things that a programmer wants to do, that it would be like O(n**2) programmer effort (where n is number of different types of AST node) to get a usable product."
That is why I am abandoning AST's entirely, and trying to come up with the simplest possible semantic model of a program, so that there are a small number of editing operations to support.
But apart from such pipe dreams, I think the key practical thing we can do, as our anonymous blogger host said, is to attach unique internal ID's to our syntax, and to do so persistently. I am writing a paper about that.
There seem to be Pythonistas here: can I ask a question? I was thinking of using Ruby for this paper because it has a semi-standard Ordered Map collection (the Facets Dictionary), which is sorted by the keys of the map. Does Python have same?
Nice chatting with you. Cheers, Jonathan
Just for the record, Smalltalk browsers edit the program objects directly for everything down to methods (ie you drag-n-drop methods and classes and you go through UI commands to edit them). Actually there is no text syntax to describe the packages/classes/method structure.
The body of each method, however, is written as text and parsed. Saving the method calls the compiler etc and you have one more method object in the program.
There are tools to manipulate the AST inside methods, like http://scg.unibe.ch/Research/Reflectivity/ but it's used more to instrument code or weave aspects than to provide semantic edition to the programmer. We have refactorings but they also go through the parse/modify/pretty-printing steps.
There are also more "toy-like" extensions like eToys or Scratch, based on Squeak, where programs are build like lego, eg. take a "while" block, drop a "comparison" one in its condition slot, etc. This is great for discoverability but I don't think it scales to power users and large libraries of code. I think there has to be a compromise between text-based edition and semantic contruction blocks, probably much like what TeXmacs does…
johnathon:
Given a dictionary, d:
items = d.items()
items.sort()
sorted_items = [v for k, v in items]
There may be some built in class somewhere, but this works for me.
I think modern IDEs actually go a long way toward the idea of representing programs as structured data. For example, eclipse has a parser that handles a wide variety of syntactically invalid code. I can type a particular (syntactically invalid) line of code, and then query for some random method in the current file or any other file without error. Between the refactoring tools and code navigation tools, it does feel like one is working on a level above raw text.
I'm now working in dynamic languages + emacs right now, but part of me misses the structured of tools of eclipse.
Didn’t we all learn that programs are structured data from lambda-calculi (or well viceversa.. http://diditwith.net/2008/01/01/BuildingDataOutOfThinAir.aspx)?
More seriously it’s right that having tools that understand the language syntax and semantics is cool, but it’s so cool that we already have them, a refactoring “rename” tool is a search and replace that understands the language. It’s not something “new”, it’s something interesting (for example, having a “diff” tool with that understanding could be useful, or new ways to visualize program structure and flow), but it’s not something new, and I don’t think it requires new languages to be done (reflection is a way to look at the code as data, and it’s already into mainstream languages like C# and Java).
I don’t like subtextual, dunno why we should ever need to have graphical ways of editing code. I don’t think that changing the way we edit the code makes coding easier, the thing that it hard to understand for newbies is the logic of code, not the syntax of the languages. Even if this subtextual thing is really the best attempt I’ve ever seen at that (well maybe except for Alice, that’s really cool). People can easily understand what an “if” does, the problem is in learning how to use those blocks to solve the problem you have at hand…
Oh Glyph. You're describing Xerblin, or what it will become[1]. It doesn't yet have a front end for python, but I have some (years old) code that reads a (tiny) subset of C, and the current version includes two small LLs. One renders to widgets but the other renders to a simple executable data structure (that the system uses to create new compound commands out of the system primitives.)
For instance, the following LL snippet creates a command called "GCD" that will calculate the greatest common denominator of two integers.
j @ tuck mod dup
*GCD = dup j drop
"tuck" "mod" dup" and "drop" are primitive words (think Forth) while "j" defines a loop and *GCD defines a sequence. The command can be "opened up" and modified directly with the mouse, say, to add a print out of the intermediate values.
The LLs keep track of the mappings from text to tokens to parsed entities (AST) to output format (commands or rendered widgets.) I'm going to use this to provide effects like composing widgets by dragging and dropping their text specs, or scanning and parsing incrementally to provide feedback to the user to workaround the invalid-syntax-while-typing issue you mention.
It will be a bit tedious to write a decent parser/renderer "codec" for python (but at least it has a syntax-- wtf ruby? php? huh?) but I plan to sooner or later. Currently I'm focused on getting it ready to be used by normal people so the advanced applications will have to wait.
[1] http://code.google.com/p/xerblin/
I did some work building a structured editor for C#... it's hard work :)
http://guilabs.net
Post a Comment