( )
The Web, Untangled

Fri 11 September 2009

In my previous post, I outlined some reasons that web development is worse than other kinds of development (specifically: traditional client-server development).  I left off there saying that I had some prescriptions for the web's ailments, though, and now I'll describe those.  Given that we're stuck with web development for the forseeable future, how can we make it a tolerable experience?

First, let me tell you what the answer isn't.  It isn't a continuation of the traditional "web framework" strategy.  These have been important tools in dredging the conceptual mire of the web for useful patterns, and at this point in history they have a long life ahead of them.  I'm not predicting the death of Django or Rails any time soon.  Django and Rails are the stucco of the web.  An important architectural innovation, to be sure: they let you cover over the materials underneath, allowing you to build structures that are appealing without fundamentally changing the necessarily ugly underpinnings.  But you can't build a skyscraper out of stucco.

As Jacob covered in great detail in his talk, innovations in the "framework" space generally involve building more and more abstractions, creating more and more new concepts to simplify the underlying concepts.  Eventually you run out of brain-space for new concepts, though, and you have to start over.

I started here by saying that we're stuck with the web.  If we can understand why we're stuck with the web, we can make it a pleasant place to be.  Of course everybody has their own ideas about what makes the web great, but it's important to remember that none of that is what makes the web necessary.

What makes the web necessary is very simple: a web browser is a turing complete execution environment, and everyone has one.  It's also got a feature-complete (if highly idiosyncratic) widget set, so you can display text, images, buttons, scrollbars, and menus, and compose your own widgets (sort of).  Most importantly, it executes code without prompting the user, which means the barrier to adoption of new applications is at zero.  Not to mention that, thanks to the huge ecosystem of existing applications, the user is probably already running a web browser.

I feel it's important to emphasize this point.  When developing an application, delivery is king.  It doesn't matter how great your application is if no users ever run it, and given how incredibly cheap in terms of user effort it is to run an application in a web browser, your application has to be really, really awesome to get them to do more work than clicking on a link.  I can't find the article, but I believe Three Rings once did an interview where they explained that some huge percentage of users (if I remember correctly, something like 90%) will leave immediately if you make them click on a "download" link to play the game, but they'll stick around if you can manage to keep it in the browser without making them download a plugin.

Improvements to ECMAScript and HTML sound fun, but if, tomorrow morning, somebody figured out how to securely execute x86 machine code on web browsers, and distribute that capability to every browser on the internet, developers would start using that almost immediately.  HTML-based applications would slowly die out, as their UIs would be comparatively slow, clunky, and limited.

Tools like the Google Web Toolkit (and Pyjamas, its Python clone), recognized this fact early on.  They treat the browser as what the browser should be: a dumb run-time.  A deployment target, not a development environment.  Seen in this light, it's possible to create layers for integration and inter-op above the complexity soup of DOM and JavaScript: despite the fact that the browser itself has no "linker" to speak of, and no direct support for library code, with GWT you get Java's library mechanism.

Although it's not particularly well-maintained, PyPy also has a JavaScript back-end, which allows you to run a restricted subset of Python ("RPython") in a web browser; I hope that in the future this will be expanded to give us a more realistic, full-featured Python VM in the browser than Pyjamas' fairly simplistic translation currently does.  In opposition to the "worrying trend" that Jacob noted, with individual applications needing to write new, custom run-times, they leverage an existing language ecosystem rather than inventing something new.

Using tools like these, you can write code in the same language client-side and server-side.  This simplifies testing.  You can at least get basic test coverage in one pass, in one runtime, even if some of that code will actually run in a different runtime later.  It simplifies implementation and maintenance, too.  You can write functions and then decide to run them somewhere else based on deployment, security, or performance concerns without necessarily rewriting them from scratch.

If toolkits like these gained more traction, it would go a long way towards interop, too.  It would be a lot easier to have an FFI between Python-in-the-browser and Java-in-the-browser than to try to wrangle every possible JavaScript hack in the book.  Similarly on the server side: once a few frameworks can standardize on rich client-server communication channels, it will be easier to have a high-level abstraction over those than over the mess of XmlHttpRequest and its various work-alikes.

There's still an important component missing, though.  Web applications almost always have 3 tiers.  I've already discussed what should happen on the first tier, the browser.  And, as GWT, NaCl and Pyjamas indicate, there are folks already hard at work on that.  The middle tier is basically already okay; server-side frameworks allow you to work with "business logic" in a fairly sane manner.  What about the database tier?

The most common complaint about the database tier is security.  Since half the time your middle tier needs to be generating strings of SQL to send to the database, there are a plethora of situations where an accidental side-channel is created, allowing users to directly access the database.

This is a much more tractable problem than the front-end problem.  For one thing, a really well-written framework, one which doesn't encourage you to execute SQL directly, can comprehensively deal with the security issue.  Similarly, a good ORM will allow you complete access to the useful features of your database without forcing you to write code in two different programming languages.

Still, there's a huge amount of wasted effort on the database side of things.  Pretty much every major database system has a sophisticated permission system that nobody really uses.  If you want to write stored procedures, triggers, or constraints in a language like Python, it is at worst impossible and at best completely non-standard and very confusing.  Finally, if you want to test anything... you're not entirely on your own, but it's going to be quite a bit harder than testing your middle-tier code.

One part of the solution to this problem comes, oddly enough, from Microsoft: LINQ, the Langauge Integrated Query component, provides a standard syntax and runtime model for queries executed in multiple different languages.  More than providing a nice wrapper over database queries, it allows you to use the same query over in-memory objects with no "database engine" to speak of.  So you can write and test your LINQ code in such a way that you don't need to talk to a database.  When you hook it up to a database, your application code doesn't even really need to know.

The other part of the solution comes from SQLite.  Right now, managing the deployment of and connection to a database is a hugely complex problem.  You have to install the software, write some config files, initialize your database, grant permissions to your application user, somehow get credentials from the database to the application, connect from the application to the database, and verify that the database's schema is the same as what the application expects.  And that's before you can even do anything!  Once you're up and running, you need to manage upgrades, schedule downtime for updating the database software (independently of upgrading the application software).  Note that the database can't be a complete solution for the application's persistence needs, either, because in order to tell the application where it needs to find the rest of its data, you need, at the very least, a hostname, username, and password for the database server.

All of this makes testing more difficult - with all those manual steps, how can you really know if your production configuration is the same as your test configuration?  It also makes development more difficult: if automatically spinning up a new database instance is hard, then you end up with a slightly-nonstandard manual database setup for each developer.  With SQLite, you can just say "database, please!" from your application code, specifying all the interesting configuration right there.

Finally, SQLite allows you to very easily write stored procedures and triggers in your "native" language.  You also don't need to quite as much, because your application can much more easily completely control access to its database, but if you want to work in the relational model it's fairly simple.  The stored procedures are just in memory, and are called like regular functions, not in an obscure embedded database environment.

In other words, for modern web applications, a database engine is really just a library.  The easier it is to treat it like one, the easier it is to deploy and manage your application.

In the framework of the future, I believe you'll be able to write UI code in Python, model code in Python, and data-manipulation code in Python.  When you need to round a number to two digits, you'll just round it off, and it'll come out right.