Declaratively

This weekend a catastrophic bug in log4j2 was disclosed, leading to the potential for remote code execution on a huge number of unpatched endpoints.

In this specific case, it turns out there was not really any safe way to use the API. Initially it might appear that the issue was the treatment of an apparently fixed format string as a place to put variable user-specified data, but as it turns out it just recursively expands the log data forever, looking for code to execute. So perhaps the lesson here is nothing technical, just that we should remain ready to patch, or that we should pay the maintainers.

Still, it’s worth considering that injection vulnerabilities of this type exist pretty much everywhere, usually in places where the supposed defense against getting catastrophically RCE’d is to carefully remember that the string that you pass in isn’t that kind of string.

While not containing anything nearly so pernicious as a place to put a URL that lets you execute arbitrary attacker-controlled code, Python’s logging module does contain a fair amount of confusing indirection around its log message. Sometimes — if you’re passing a non-zero number of *args — the parts of the logging module will interpret msg as a format string; other times it will interpret it as a static string. This is in some sense a reasonable compromise; you can have format strings and defer formatting if you want, but also log.warning(f"hi, {attacker_controlled_data}") is fairly safe by default. It’s still a somewhat muddled and difficult to document situation.

Similarly, Twisted’s logging system does always treat its string argument as a format string, which is more consistent. However, it does let attackers put garbage into the log wherever the developer might not have understood the documentation.¹

This is to say nothing of the elephant in the room here: SQL. Almost every SQL API takes a bunch of strings, and the ones that make you declare an object in advance (i.e. Java’s PreparedStatement) don’t mind at all if you create one at runtime.

In the interest of advancing the state of the art just a little here, I’d like to propose a pattern to encourage the idiomatic separation of user-entered data (i.e. attacker-controlled payloads) from pre-registration of static, sensitive data, whether it’s SQL queries, format strings, static HTML or something else. One where copying and pasting examples won’t instantly subvert the intended protection. What I suggest would look like this:

# module scope
with sql_statements.declarations() as d:
    create_table = d.declare("create table foo (bar int, baz str)")
    save_foo = d.declare("insert into foo values (?, ?)")
    load_by_bar = d.declare("select * from foo where bar = :bar")

# later, inside a function
con = sqlite3.connect(":memory:")
cur = con.cursor()
create_table.run(cur)
save_foo.run(cur, 3, "hello")
save_foo.run(cur, 4, "goodbye")
print((list(load_by_bar.run(cur, bar=3))))

The idea here is that sql_statements.declarations() detects which module it’s in, and only lets you write those declarations once. Attempting to stick that inside your function and create some ad-hoc formatted string should immediately fail with a loud exception; copying this into the wrong part of your code just won’t work, so you won’t have a chance to create an injection vulnerability.

If this idea appeals to you, I’ve written an extremely basic prototype here on github and uploaded it to PyPI here.

I’m not dropping a 0day on you, there’s not a clear vulnerability here; it only lets you draw data from explicitly-specified parameters into the log. If you use it wrong, you just might get an "Unable to format event" type error, which we'll go out of our way to not raise back to you as an exception. It just makes some ugly log messages. ↩