A collection of articles, ideas, and rambling from a guy who wrote some software that one time.

Saturday, December 31, 2005

Why Axiom Doesn't Expose SQL

I didn't put the full title of this article into the "subject" because it was super long.

It Is Pretty Much a Bad Idea to Expose Raw SQL Through Your Database Access Layer

or

Fun Things I Found Out About Your Company With Administrator Access to your Database

I was originally inspired to write something about this when I read Jonathan Ellis remarking that ORMs should include direct SQL access because Django recently added a different, but still dodgy, 'OR'-operator support to a syntactically disappointing ORM query syntax. However, it really came to a boil for me when I found that Divmod had been running some third-party software with an SQL injection vulnerability. (Yes, we have since patched it, no harm done.)

Security experts have long known that code-injection attacks are pretty easy on many popular programming platforms, and you should take steps to prevent them. It's easy to find commentary on this. Stephen Thorne has had many amusing and insightful things to say about PHP's vulnerability to SQL injection attacks, as well as the occasional dig about just injecting PHP code itself. If you're looking for something more serious, Steve Friedl has written a fairly comprehensive guide to understanding, executing, and preventing against SQL injection.

An ORM's job is to provide an alternative interface to a database. Interfaces should be complete things, not broken fragments of utility which require manual crank-turning to function. If you have to use a different mechanism to access the database within your application, the ORM is incomplete and should be fixed. Sure, many programmers who use ORMs also know SQL, and that is a useful skill, because today these are in closely related problem domains, but they should not have to use SQL within the same context that is using the ORM.

The python "os" module provides (among other things) an alternative interface to a large portion of the POSIX C API. As I said, interfaces should be complete things. If you use a different mechanism to access the POSIX C API, the "os" module is incomplete and should be fixed. Again, sure: many programmers who use Python's "os" module also know the POSIX C API, and that is a useful skill, because these are related problem domains, but they should not have to use the POSIX C API within the same context that is using the "os" module.

In both cases, you can generate the underlying code yourself, and in both cases, people sometimes really need to, so the fact that you can is important. However, Few C programmers ever want to drop back down to C when they're using Python, and will rightly avoid it (as a complexity cost) when they can; yet many Python programmers who use ORMs frequently and loudly declare that they want to use SQL all the time.

Not to pick on Mr. Ellis. The syntax he's reacting to really is abhorrent (although that's no comment on Django as a whole), and the tremendous Ruby on Rails movement seems to largely agree with his point. There are good reasons that people want SQL access from within ORMs. It's simple: most ORMs are really, really awful. They are heavy on the "object" and not so much on the "database". There are a lot of features that SQL provides which they don't expose.

If you use an open-source ORM and find yourself wanting to use SQL to get at one of those features, consider not clamoring for (or not using, if one exists) an SQL execution back-door included in the library. Instead, consider ways to integrate that SQL feature with the existing structure of the ORM, or an extension which wraps that SQL feature on top of the ORM and only generates the SQL you need in one small place. Obviously this goes double if you are a user of Axiom - I do my best to accept any patches that expose new database features that were previously obscured.

Don't just accept the status quo and generate SQL strings from within your application. Originally this post was going to be longer and talk about API structure and communication between programmers and preconditions and postconditions and all kinds of fancy computer-science garbage, but I think it would be better to leave you with just this one thought - the security implications alone are more than enough reason to be extremely sparing, and careful, with the places that your code generates SQL. Isolate it, test it, audit it, and don't make it a habit.

Update: This article is confusingly titled. In fact, Axiom does provide an API for getting at SQL. Store.executeSQL. The point I am stressing here is that axiom does not "expose" it in that it is not a supported, public API, and if you have to call it, Axiom is broken and you should let Divmod know what you needed it for. To attempt to totally deny access to that layer would be unwise; as I said earlier in the article, "you can generate the underlying code yourself, and in both cases, people sometimes really need to, so the fact that you can is important".

3 comments:

fuz22 said...

The major lesson I took from Friedl's piece is that it is wise to use prepared statements.

I have issue with this.

Every API implementation of prepared statements I've seen has relied on integer-indexed variable setting. This is the stupidest thing ever. If I wanted to be doing a lot of hard-coded indexing, I'd be coding in C.

So: Is there an implementation of prepared statements in a high-level language that uses name-based set syntax? If not, am I simply doomed either to be vulnerable to code injection or to use stupid code practices?

The solution I've been using is to replace all quotes with their html equivalents:
s = s.replaceAll("\"", """);
s = s.replaceAll("'", "'");
which has the advantage that the replaced text has no quote characters in it. Friedl implies that this is easily broken if you just know how. Understanding that method would sure light a fire under my willingness to change over.

fuz22 said...

1) I am using Java. The Javadoc is here:
http://java.sun.com/j2se/1.4.2/docs/api/java/sql/PreparedStatement.html

There do not appear to be any named binding mechanisms. This is a problem for me because I frequently use SQL statements of up to 20 parameters.

2) I think your analogy to functions is specious. It's not that they are merely ordered, it is that they are bound to integer indexes. This seems more like using a huge of typed variables for your program named var1, var2, var3, etc..

3) I fail to understand how this would work, and I would really like to. Explain further?

4) heh. I.. uh... totally fucked up the code cut-n-paste. My code replaces annoying characters with their HTML entity replacements, not what appears above. So backslashes don't get replaced at all - the \" there is escaping the double quote in the java string.

5) I make no claims that this is a good general solution, but all of my strings are going to or from HTML.

6) I totally agree that PreparedStatements are the most correct way of dealing with this issue for security. I'm just not sure that for all cases, the most secure way is needed, particularly if I have to use unnamed bindings.

glyf said...

For those of you still thinking that maybe manual SQL generation is a good idea: it isn't, I won the argument :-).