Hobgoblin History

Sunday October 04, 2009
I like Terry Jones; I think FluidDB has a lot of potential.  But, sometimes when he's talking about it, he gets a little carried away and forgets that the rest of us don't live in his future yet.  In his latest missive on the official FluidDB blog, "Digital Hobgoblins", he describes some of the problems that FluidDB sets out to solve.

The problem is, I already have solutions for all of these problems, and I don't quite understand why they don't (or shouldn't) work for me.  (Since he organizes the post in terms of problems that existing systems have, I'm going to take the liberty of re-labeling these in terms of the problems that he seems to be describing rather than the lead text he used.  Please post a comment if you think my labeling is wrong.)

In existing systems, Terry says:

"Things must be named, and have one name."  Specifically, Terry calls out file systems.  Except... file systems have lots of ways of introducing multiple names for the same thing.  Symbolic links.  Hard links, if you really want to allow for ambiguity.  If you want to track that ambiguity, Windows "shortcuts" and MacOS "aliases" can do that.  Overlay mounts, loopback mounts and chroot execution allow for semi-arbitrary renaming.  Lots of other systems support this, too.  Database systems have a specific provision for multiple names: the many-to-one relation.  Any programming language with pass-by-reference data structures allows for some level of multiple-naming.  In fact, there's a whole discipline for allowing things to have lots of different names: indexing.  Anywhere you have a full-text index or an object where multiple attributes are indexed in some kind of database, you've got objects with more than one name.

"You have to be consistent and unambiguous."  As I mentioned on the first point, there are lots of ways to be slightly ambiguous at a human level.  You can refer to the same thing by different names, or, with mutable binding, you can refer to the different things with the same name.  In some circumstances, you must be precise, but that's because fundamentally, algorithmic thinking requries a certain level of precision, not because of any specific problem with computers.  In fact, there is a word for inconsistency and ambiguity in programming languages: polymorphism.  Any time you invoke an interface rather than a concrete implementation (which is to say any time you do anything in a dynamic language like Python) you are being ambiguous and potentially inconsistent in your program's behavior.

"You only get one way to organize stuff."  This is a pretty weak point, though, given that Terry himself immediately turns around and notices that tagging and other multiply-indexed database systems are becoming popular.  So he gives us two examples of exceptions, but no examples of the rule.  I'm not sure what I could add to that.

"Programmers are obsessed with "meaning"."  On this one, I'm going to agree, except I don't think it's a problem.  In the computational world, we are obsessed with the meaning of data, because if you get the meaning of the inputs wrong, then the meaning of the outputs is wrong too.  For example: if you have a number that represents the total liabilities that your company has accumulated, it's pretty important that you don't ever treat that as your total profit.  At a deeper level, if you have a sequence of bits that represents a floating-point number, it's important to know about its intended meaning, and not treat it as a string of characters, unless what you really want is a string.  "@H=N" is not as useful a concept as "3.1287417411804199" if you are trying to add it to something.  For what it's worth, I have my own, similar take on how we should treat computational objects that have multiple meanings: Imaginary. Even systems like Imaginary and FluidDB depend on a very rigid definition of some simpler concepts, like numbers consistently being numbers and words consistently being words.  In my view, even if we treat the book itself as multifaceted, it's important to know what the data representing the "readable object" part of a book is really "about", and make sure it stays distinct from the data representing the "paperweight" part of the book.  To be fair, FluidDB appears to do this itself — and this terminology is my least-favorite part of FluidDB — by having single-purpose, permission-controlled "objects" just like every other system, but calling them "tags", and re-using the word "objects" to refer instead to what others might call a "UUID" or "central index".  In Imaginary, the system is similar; although the centrality of the FluidDB "object" (in Imaginary's case, the "Thing") is less stark; using FluidDB's terminology, in Imaginary, a "tag" can have a "tag" of its own; in fact, there's nothing but tags ("Items") anywhere.

"Metadata is separated from the data it describes."  This may be true in some systems, but the web is probably the system with the most data in it anywhere, and in that system, metadata is always available as part of the request and the response.  You can put in any headers you want in the response, and there are lots of pieces of metadata (like content-type) which are almost always found along with the data.  In my opinion, the problem is more that we don't have enough of the previous problem.  Web developers haven't been obsessed enough with meaning: there aren't enough useful conventions around the HTTP request/response metadata, and so it's hard to bundle more metadata in with your response and have it faithfully propagated elsewhere.  We don't know what arbitrary headers might mean, because we don't have any way of expressing a schema for them.

Terry says he's going to write more about these problems, and the solutions that FluidDB provides for them.  I'm looking forward to it.  As part of that, I'd really like to see a clear description of how these problems affect me, or someone I know, either as a programmer or as a user.  What do I, or should I, really want to do with some application right now that these five problems are preventing me from doing?

The reason I felt compelled to write about this is that history — and particularly the history of websites like freshmeat and sourceforge — is littered with the corpses of projects which promised to fundamentally change the way we represent data.  A common problem with these projects is that they have expansive denunciations of current techniques to represent data, or manage persistence, and claim to provide an advance so significant that they will displace all current applications.  What most of the people working on these projects don't realize is that the current techniques for representing data have a history, and there are good reasons for their limitations.  Granted, not all of those reasons are currently relevant, and many are examples of path dependence, but it's still important to understand the reasons in order to escape the problems.

In FluidDB's case, I think that the problem isn't so much that Terry doesn't have the historical perspective, but that he assumes that we all do.  And that we can all make the cognitive leap to see why FluidDB is necessary.  But if I can't do it, I have to assume there are at least a few other programmers who aren't getting the message either.