I didn't put the full title of this article into the "subject" because it was super long.

It Is Pretty Much a Bad Idea to Expose Raw SQL Through Your Database Access Layer

Fun Things I Found Out About Your Company With Administrator Access to your Database

I was originally inspired to write something about this when I read Jonathan Ellis remarking that ORMs should include direct SQL access because Django recently added a different, but still dodgy, 'OR'-operator support to a syntactically disappointing ORM query syntax. However, it really came to a boil for me when I found that Divmod had been running some third-party software with an SQL injection vulnerability. (Yes, we have since patched it, no harm done.)

Security experts have long known that code-injection attacks are pretty easy on many popular programming platforms, and you should take steps to prevent them. It's easy to find commentary on this. Stephen Thorne has had many amusing and insightful things to say about PHP's vulnerability to SQL injection attacks, as well as the occasional dig about just injecting PHP code itself. If you're looking for something more serious, Steve Friedl has written a fairly comprehensive guide to understanding, executing, and preventing against SQL injection.

An ORM's job is to provide an alternative interface to a database. Interfaces should be complete things, not broken fragments of utility which require manual crank-turning to function. If you have to use a different mechanism to access the database within your application, the ORM is incomplete and should be fixed. Sure, many programmers who use ORMs also know SQL, and that is a useful skill, because today these are in closely related problem domains, but they should not have to use SQL within the same context that is using the ORM.

The python "os" module provides (among other things) an alternative interface to a large portion of the POSIX C API. As I said, interfaces should be complete things. If you use a different mechanism to access the POSIX C API, the "os" module is incomplete and should be fixed. Again, sure: many programmers who use Python's "os" module also know the POSIX C API, and that is a useful skill, because these are related problem domains, but they should not have to use the POSIX C API within the same context that is using the "os" module.

In both cases, you can generate the underlying code yourself, and in both cases, people sometimes really need to, so the fact that you can is important. However, Few C programmers ever want to drop back down to C when they're using Python, and will rightly avoid it (as a complexity cost) when they can; yet many Python programmers who use ORMs frequently and loudly declare that they want to use SQL all the time.

Not to pick on Mr. Ellis. The syntax he's reacting to really is abhorrent (although that's no comment on Django as a whole), and the tremendous Ruby on Rails movement seems to largely agree with his point. There are good reasons that people want SQL access from within ORMs. It's simple: most ORMs are really, really awful. They are heavy on the "object" and not so much on the "database". There are a lot of features that SQL provides which they don't expose.

If you use an open-source ORM and find yourself wanting to use SQL to get at one of those features, consider not clamoring for (or not using, if one exists) an SQL execution back-door included in the library. Instead, consider ways to integrate that SQL feature with the existing structure of the ORM, or an extension which wraps that SQL feature on top of the ORM and only generates the SQL you need in one small place. Obviously this goes double if you are a user of Axiom - I do my best to accept any patches that expose new database features that were previously obscured.

Don't just accept the status quo and generate SQL strings from within your application. Originally this post was going to be longer and talk about API structure and communication between programmers and preconditions and postconditions and all kinds of fancy computer-science garbage, but I think it would be better to leave you with just this one thought - the security implications alone are more than enough reason to be extremely sparing, and careful, with the places that your code generates SQL. Isolate it, test it, audit it, and don't make it a habit.

Update: This article is confusingly titled. In fact, Axiom does provide an API for getting at SQL. Store.executeSQL. The point I am stressing here is that axiom does not "expose" it in that it is not a supported, public API, and if you have to call it, Axiom is broken and you should let Divmod know what you needed it for. To attempt to totally deny access to that layer would be unwise; as I said earlier in the article, "you can generate the underlying code yourself, and in both cases, people sometimes really need to, so the fact that you can is important".

Why Axiom Doesn't Expose SQL

It Is Pretty Much a Bad Idea to Expose Raw SQL Through Your Database Access Layer

Fun Things I Found Out About Your Company With Administrator Access to your Database