This week I've been in New York City with the rest of the divmod developers, trying to get our product (service, really) into an acceptable state to start selling it to people.
Aside from some really unfortunate personal issues, this has been easily the best week of the last year. Every day I sat down to do some work, every day I got something quantifiable done. There are design issues, but there was no thrashing between completely different ways of doing things, only subtle corrections of possible problems.
Even the one really frightening thing that happened this week, a DB_RUNRECOVERY error which spuriously appeared when we attempted to open our database file, turned out to be a peculiarity in the interface of bsddb and not a systematic problem. JP and I managed to get the database re-opened and uncorrupt just by reading the mnet source code and tweaking some variables.
The two stars of this week were really "atop" (atomic transactional object persistence), and "nevow" (pronounced "nouveau", the next version of woven). After a week of intense work, we have a new UI and a new database: the new UI is almost to the point where the old one was, and the new database is far beyond the point where the old one was.
Despite this, there's a huge amount of work remaining, especially to rewrite our input handling and such. Still, I estimate at this point that I'll be keeping my email in this system in 2 weeks or less, and I'm far more sure of that estimate than I have been at any point in the past.
Brian Warner dropped by today and we spent a good chunk of the day speccing out New PB. The New PB will resolve pretty much all the security issues I've come up with in the long purgatory of implementation-less contemplation of my previous work, plus it gives us a good place to define all *kinds* of wacky transport semantics that would otherwise have made the old infrastructure blow up.
In brief, every object that you want to publish remotely will have a schema or interface that specifies how it is serialized or referenced remotely. We will be trying to produce arbitrarily publish-able objects that have both state and methods, where you can say how you want to send things by default, but you can also switch it up in different situations without defining a bazillion classes.
The only lack of convenience is that we are going to spit out a near-continuous stream of warnings if you try to go for the "arbitrary graphs of objects" behavior that the previous version allowed. Sorry, folks, but that's just not safe in python and it can never be made safe :). However, we will still allow it for prototyping purposes, and maybe even add a mode where the system spits out an example schema after you run "trial" over unit tests that invoke the PB serializer on a particular set of classes.
We also discussed his "Pet Mail" project, which is a replacement for SMTP. It got me thinking about how poorly defined our shared vision for Q2Q is. We've really only started the very basic design discussions. Brian's project is a really good example of the kind of straightforward crypto work that we haven't had any time or inclination to do. However, it's (explicitly) got no deployment considerations, so if we can leverage some of the planned features of Q2Q and Quotient to get the base PetMail protocol widely deployed, we can establish a shared, clear basis for verified personal communication. Divmod can build more interesting applications on top of that, but it would give us a simplified transport layer to sell to other ISPs so that we don't need to spend all our processing power and disk on brute-force spam filtering.
This has definitely been my best coding day on Divmod yet this month. Into the wee hours of the morning, Allen and I tore through the rest of the tasks which I had estimated it would take the rest of the release to complete.
We also had a fun discussion about the successor to twisted.world, CARCOSA (backronymed to "Concurrent, Atomic, Reliable Object Storage Architecture"). When we can refactor Quotient's store into a bit more general of a structure, and also take advantage of what must be a significant speed boost in using fixed-length rather than variable records in bsddb, we should be able to implement everything that we had hoped for in twisted.world and possibly more.
We're probably going to need to add a twisted.python.schema module, which should be shared between Formless, CARCOSA and PB. While CARCOSA is dealing about storage and Formless/PB is dealing with interactivity, you should be able to express type constraints the same way.
An object which implements a TypedInterface and also specifies __schema__ itself will be able to be published on the web, transactionally stored in a database, and also published over a custom, interactive protocol, all with no additional work besides what the average Java programmer has to endure when defining a class. Plus, you can develop your objects unencumbered by the schema and only nail them down once you've developed an understanding of their requirements through experimentation.
Well, it took most of the day, but once Allen pointed out a stupid error I was making by ignoring a potential error condition, the first pass on bsddb/store integration is already committed to Quotient! Plus, I took a nap today, so I should have a few more good hours of programming left to get itempool plugged in.
In keeping with the theme of this journal, I decided to update the picture. Some of you may recognize the icon associated with this post :).
Strangely it did not work until I uploaded an LZW-compressed GIF. It didn't work as either a PNG output from Gimp, from pngcrush, from convert or from sodipodi, nor a JPG from convert or gimp. I suppose you have to mix the evil of the Sign itself with the evil of the Unisys patent
Well! It turns out that the bsddb conversion was pretty easy after all. (At least, moving object storage to it.) Now all I have to do is track down this stupid identity management issue and I'm all set.
The only problem is that somehow, objects are sometimes being doubly instantiated when they should only be instantiated once, and sometimes they're not being re-instantiated when they should be garbage collected and re-created.
In the particular issue I'm debugging at the moment, the object relationship is SUPPOSED to be:
stack : item stack : pool : store : weak cache : item
and since the stack is holding a strong reference to the item, it shouldn't go away and be re-created, but somewhere in there there's a mistake.
Is there any common pattern, either for implementing or debugging these kinds of "this object can be garbage collected back to storage but it really only exists once" kind of things?
Today I actually discovered BSDDB. I can't believe I missed this. It's an efficient, in-process, open-source, simple, transactional database which does almost everything I have ever wanted a database to do. For free.
This discovery is both exciting and terrifying. Allen and I ran into this together, pair programming and attempting to reconcile the highly single-process logic in the current Quotient with the highly multi-process logic in the new QQ (Quotient Queues) module we're integrating. After looking at one particular function that was obviously fragile and could lose data at several points, we decided we needed to solidify and centralize our transaction processing into a single place. Since we knew that BSDDB had "some transaction support", we figured we could use it for what we needed.
We got more than we bargained for. One of the first things that we discovered that we had previously mis-read the documentation: we believed that BSDDB was single-process, based on the fact that their documentation talked about multi-reader access only being available for non-transactional data stores, and other areas referred to "threads of control".
It turns out that it's perfectly usable from multiple processes simultaneously. "Thread of control" actually means "process, thread, or other encapsulation of a program counter" the way they use it in their documentation.
So, on the one hand, it will make the incredibly arduous task of making our central data store 100% reliable much, much easier than it previously was. Rather than being an ongoing task with many threads left hanging, it will be almost completely done when we finish this refactoring. On the other, this increases the amount of changes we have to get done for this release - by saturday. This is a much larger snag than I expected to hit at this point, considering that we'd already gotten through a lot of the "hard stuff" - figuring out how to make multi-process communication both transactional and observable from the user interface. (Luckily, much of that work won't be wasted since we will be using it to manage transactions across multiple machines.)
Just finished surveying my work for the day and writing my update about it. It's been a long day and it looks to be one again tomorrow.
Musing about blogging, though. It never really occurred to me how the format blends somewhere between a mailing list and a chat room. The logjam client really brings that out, since the icon for updating the blog is right there on my panel all the time.
I suppose I should decide something to do with this blog. Considering it is likely to be a temporary experiment, I don't want to try for anything terribly ambitious. I will keep the parameters loose, but I think I'll go for a combination of personal rambilng and a discussion of the design I'm doing at work.
On that note, today I did a lot of thinking about context, which has something to do with blogs. One of the reasons - the only real reason, I'd posit - that blogs are so popular is that email software is so terrible. I should easily be able to put a "new email message to list" button onto my gnome panel, but this never occurred to me, and it's more work than it's worth. (I did work out that a launcher icon that runs "mozilla -remote 'openURL(mailto:email@example.com)'" will do the trick, but really, if I can't work this out on my own, is the average user likely to? Is this even possible on non-UNIX OS'es? with non-mozilla mail clients? COM doesn't count unless you can do it with the tools generally available with free email clients...)
In principle, though, there is nothing more sophisticated going on here than an email with a couple of X-headers and a cute trick with xmms - not even my preferred mp3 player any more, though I switched back to it just to see the 'detect' button suck the song titles out of it.
But, back to context. The thing about email software that's so terrible is that it presents all messages to you at equal priority. It's a considerable amount of work to re-prioritize messages in a meaningful way, and even if you do, it's very difficult to track your time when working with email to spend the appropriate amount of time dealing with them.
Blogs solve the first problem very well. They don't stack up on your inbox. When you go to read a blog, you are quite explicitly in your "reading a blog" context. Even without any time tracking tools, you know approximately how much time you ought to spending doing that, so they do decently well with the second. The poor UI characteristics of the web don't evidence themselves so much because you're just statically reading, and the web does have very good layout tools.
Web forums don't fare so well. They do have a noticeable disadvantage against email clients because you can't consolidate your messages, quoted reply is difficult and broken and inconsistent among implementations. They are much worse at solving the second problem, too - it's very easy to get sucked into a forum discussion which takes hours to compose a scathing reply when you should be dealing with other forms of input into your life.
Personally, my vision for Quotient involves a consistent, context-based way of reading messages and holding online conversations with integrated timetracking and task management. When I sit down to read a blog, I will type "I am giving myself an hour to read this today", and at the end of that hour, I want my browser window to be closed and any messages I'm working on to be automatically saved (with undo buffer and clipboard, if possible, because I might have just cut something big out of the message I was editing). A subtle reminder to get on with my life and not get lost online. However, there is no distinction for me between a blog and a mailing list; I want the same time-limiting and monitoring to take place there, including my reply compositions. I can always click the "give me an unbounded amount of time to noodle around on the web" button, but then I should be able to see a visible indication of how much time I've spent.
Tools like this would serve a greater purpose than simply inhibiting 'net addiction - they would give me a way to continue to enjoy and participate in the communities I frequent in a more substantial way without interfering with real life.
In that spirit, I will leave this message unedited so that I don't spend any more of the time I should use for sleeping writing it :).