This weekend a catastrophic bug in log4j2
was
disclosed,
leading to the potential for remote code execution on a huge number of
unpatched endpoints.
In this specific case, it turns out there was not really any safe way to use the API. Initially it might appear that the issue was the treatment of an apparently fixed format string as a place to put variable user-specified data, but as it turns out it just recursively expands the log data forever, looking for code to execute. So perhaps the lesson here is nothing technical, just that we should remain ready to patch, or that we should pay the maintainers.
Still, it’s worth considering that injection vulnerabilities of this type exist pretty much everywhere, usually in places where the supposed defense against getting catastrophically RCE’d is to carefully remember that the string that you pass in isn’t that kind of string.
While not containing anything nearly so pernicious as a place to put a URL that
lets you execute arbitrary attacker-controlled code, Python’s
logging
module does contain a fair amount of confusing indirection around its log
message. Sometimes — if you’re passing a non-zero number of *args
— the
parts of the logging module will interpret msg
as a format string; other
times it will interpret it as a static string. This is in some sense a
reasonable compromise; you can have format strings and defer formatting if you
want, but also log.warning(f"hi, {attacker_controlled_data}")
is fairly
safe by default. It’s still a somewhat muddled and difficult to document
situation.
Similarly, Twisted’s logging system does always treat its string argument as a format string, which is more consistent. However, it does let attackers put garbage into the log wherever the developer might not have understood the documentation.1
This is to say nothing of the elephant in the room here: SQL. Almost every SQL
API takes a bunch of strings, and the ones that make you declare an object in
advance (i.e. Java’s PreparedStatement
) don’t mind at all if you create one
at runtime.
In the interest of advancing the state of the art just a little here, I’d like to propose a pattern to encourage the idiomatic separation of user-entered data (i.e. attacker-controlled payloads) from pre-registration of static, sensitive data, whether it’s SQL queries, format strings, static HTML or something else. One where copying and pasting examples won’t instantly subvert the intended protection. What I suggest would look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
The idea here is that sql_statements.declarations()
detects which module it’s
in, and only lets you write those declarations once. Attempting to stick
that inside your function and create some ad-hoc formatted string should
immediately fail with a loud exception; copying this into the wrong part of
your code just won’t work, so you won’t have a chance to create an injection
vulnerability.
If this idea appeals to you, I’ve written an extremely basic prototype here on github and uploaded it to PyPI here.
-
I’m not dropping a 0day on you, there’s not a clear vulnerability here; it only lets you draw data from explicitly-specified parameters into the log. If you use it wrong, you just might get an "Unable to format event" type error, which we'll go out of our way to not raise back to you as an exception. It just makes some ugly log messages. ↩