One of the wonderful things about Python is the ease with which you can start
writing a script - just drop some code into a .py
file, and run python
my_file.py
. Similarly it’s easy to get started with modularity: split
my_file.py
into my_app.py
and my_lib.py
, and you can import my_lib
from
my_app.py
and start organizing your code into modules.
However, the details of the machinery that makes this work have some
surprising, and sometimes very security-critical consequences: the more
convenient it is for you to execute code from different locations, the more
opportunities an attacker has to execute it as well...
Python needs a safe space to load code from
Here are three critical assumptions embedded in Python’s security model:
- Every entry on
sys.path
is assumed to be a secure location from which
it is safe to execute arbitrary code.
- The directory where the “main script” is located is always on
sys.path
.
- When invoking
python
directly, the current directory is treated as the
“main script” location, even when passing the -c
or -m
options.
If you’re running a Python application that’s been installed properly on your
computer, the only location outside of your Python install or virtualenv that
will be automatically added to your sys.path
(by default) is the location
where the main executable, or script, is installed.
For example, if you have pip
installed in /usr/bin
, and you run
/usr/bin/pip
, then only /usr/bin
will be added to sys.path
by this
feature. Anything that can write files to that /usr/bin
can already make
you, or your system, run stuff, so it’s a pretty safe place. (Consider what
would happen if your ls
executable got replaced with something nasty.)
However, one emerging convention is to
prefer calling /path/to/python -m pip
in order to avoid the complexities of
setting up $PATH
properly, and to avoid dealing with divergent documentation
of how scripts are installed on Windows (usually as .exe
files these days,
rather than .py
files).
This is fine — as long as you trust that you’re the only one putting files into
the places you can import from — including your working directory.
Your “Downloads” folder isn’t safe
As the category of attacks with the name “DLL
Planting”
indicates, there are many ways that browsers (and sometimes other software) can
be tricked into putting files with arbitrary filenames into the Downloads
folder, without user interaction.
Browsers are starting to take this class of vulnerability more seriously, and
adding various mitigations to avoid allowing sites to surreptitiously drop
files in your downloads folder when you visit them.
Even with mitigations though, it will be hard to stamp this out entirely: for
example, the Content-Disposition
HTTP header’s filename*
parameter
exists entirely to allow the the site to choose the filename that it downloads
to.
Composing the attack
You’ve made a habit of python -m pip
to install stuff. You download a Python
package from a totally trustworthy website that, for whatever reason, has a
Python wheel by direct download instead of on PyPI. Maybe it’s internal, maybe
it’s a pre-release; whatever. So you download totally-legit-package.whl
, and
then:
| ~$ cd Downloads
~/Downloads$ python -m pip install ./totally-legit-package.whl
|
This seems like a reasonable thing to do, but unbeknownst to you, two weeks ago,
a completely different site you visited had some XSS JavaScript on it that
downloaded a pip.py
with some malware in it into your downloads folder.
Boom.
Demonstrating it
Here’s a quick demonstration of the attack:
| ~$ mkdir attacker_dir
~$ cd attacker_dir
~/attacker_dir$ echo 'print("lol ur pwnt")' > pip.py
~/attacker_dir$ python -m pip install requests
lol ur pwnt
|
PYTHONPATH
surprises
Just a few paragraphs ago, I said:
If you’re running a Python application that’s been installed properly on your
computer, the only location outside of your Python install or virtualenv that
will be automatically added to your sys.path
(by default) is the location
where the main executable, or script, is installed.
So what is that parenthetical “by default” doing there? What other directories
might be added?
Anything entries on your
$PYTHONPATH
environment variable. You wouldn’t put your current directory on
$PYTHONPATH
, would you?
Unfortunately, there’s one common way that you might have done so by accident.
Let’s simulate a “vulnerable” Python application:
| # tool.py
try:
import optional_extra
except ImportError:
print("extra not found, that's fine")
|
Make 2 directories: install_dir
and attacker_dir
. Drop this in
install_dir
. Then, cd attacker_dir
and put our sophisticated malware
there, under the name used by tool.py
:
| # optional_extra.py
print("lol ur pwnt")
|
Finally, let’s run it:
| ~/attacker_dir$ python ../install_dir/tool.py
extra not found, that's fine
|
So far, so good.
But, here’s the common mistake. Most places that still recommend PYTHONPATH
recommend adding things to it like so:
| export PYTHONPATH="/new/useful/stuff:$PYTHONPATH";
|
Intuitively, this makes sense; if you’re adding project X to your
$PYTHONPATH
, maybe project Y had already added something, maybe not; you
never want to blow it away and replace what other parts of your shell startup
might have done with it, especially if you’re writing documentation that lots
of different people will use.
But this idiom has a critical flaw: the first time it’s invoked, if
$PYTHONPATH
was previously either empty or un-set, this then includes an
empty string, which resolves to the current directory. Let’s try it:
| ~/attacker_dir$ export PYTHONPATH="/a/perfectly/safe/place:$PYTHONPATH";
~/attacker_dir$ python ../install_dir/tool.py
lol ur pwnt
|
Oh no! Well, just to be safe, let’s empty out $PYTHONPATH
and try it again:
| ~/attacker_dir$ export PYTHONPATH="";
~/attacker_dir$ python ../install_dir/tool.py
lol ur pwnt
|
Still not safe!
What’s happening here is that if PYTHONPATH
is empty, that is not the same
thing as it being unset. From within Python, this is the difference between
os.environ.get("PYTHONPATH") == ""
and os.environ.get("PYTHONPATH") ==
None
.
If you want to be sure you’ve cleared $PYTHONPATH
from a shell (or somewhere
in a shell startup), you need to use the unset
command:
| ~/attacker_dir$ python ../install_dir/tool.py
extra not found, that's fine
|
Setting PYTHONPATH
used to be the most common way to set up a Python
development environment; hopefully it’s mostly fallen out of favor, with
virtualenvs serving this need better. If you’ve got an old shell configuration
that still sets a $PYTHONPATH
that you don’t need any more, this is a good
opportunity to go ahead and delete it.
However, if you do need an idiom for
“appending to” PYTHONPATH
in a shell startup, use this
technique:
| export PYTHONPATH="${PYTHONPATH:+${PYTHONPATH}:}new_entry_1"
export PYTHONPATH="${PYTHONPATH:+${PYTHONPATH}:}new_entry_2"
|
In both bash and zsh, this results in
| $ echo "${PYTHONPATH}"
new_entry_1:new_entry_2
|
with no extra colons or blank entries on your $PYTHONPATH
variable now.
Finally: if you’re still using $PYTHONPATH
, be sure to always use absolute
paths!
There are a bunch of variant unsafe behaviors related to inspecting files in
your Downloads
folder by doing anything interactive with Python. Other risky
activities:
- Running
python ~/Downloads/anything.py
(even if anything.py
is itself
safe) from anywhere - as it will add your downloads folder to sys.path
by
virtue of anything.py
’s location.
- Jupyter Notebook puts the directory that the notebook is in onto
sys.path
,
just like Python puts the script directory there. So jupyter notebook
~/Downloads/anything.ipynb
is just as dangerous as python
~/Downloads/anything.py
.
Get those scripts and notebooks out of your downloads folder before you run ’em!
But cd Downloads
and then doing anything interactive remains a problem too:
- Running a
python -c
command that includes an import
statement while in
your ~/Downloads
folder
- Running
python
interactively and importing anything while in your
~/Downloads
folder
Remember that ~/Downloads/
isn’t special; it’s just one place where
unexpected files with attacker-chosen filenames might sneak in. Be on the
lookout for other locations where this is true. For example, if you’re
administering a server where the public can upload files, make extra sure
that neither your application nor any administrator who might run python
ever
does cd public_uploads
.
Maybe consider changing the code that handles uploads to mangle file names to
put a .uploaded
at the end, avoiding the risk of a .py
file getting
uploaded and executed accidentally.
Mitigations
If you have tools written in Python that you want to use while in your
downloads folder, make a habit of preferring typing the path to the script
(/path/to/venv/bin/pip
) rather than the module (/path/to/venv/bin/python -m
pip
).
In general, just avoid ever having ~/Downloads
as your current working
directory, and move any software you want to use to a more appropriate location
before launching it.
It’s important to understand where Python gets the code that it’s going to be
executing. Giving someone the ability to execute even one line of arbitrary
Python is equivalent to giving them full control over your computer!
Why I wrote this article
When writing a “tips and tricks” article like this about security, it’s very
easy to imply that I, the author, am very clever for knowing this weird bunch
of trivia, and the only way for you, the reader, to stay safe, is to memorize a
huge pile of equally esoteric stuff and constantly be thinking about it.
Indeed, a previous draft of this post inadvertently did just that. But that’s
a really terrible idea and not one that I want to have any part in propagating.
So if I’m not trying to say that, then why post about it? I’ll explain.
Over many years of using Python, I’ve infrequently, but regularly, seen users
confused about the locations that Python loads code from. One variety of this
confusion is when people put their first program that uses Twisted into a file
called twisted.py
. That shadows the import of the library, breaking
everything. Another manifestation of this confusion is a slow trickle of
confused security reports where a researcher drops a module into a location
where Python is documented to load code from — like the current directory in
the scenarios described above — and then load it, thinking that this reflects
an exploit because it’s executing arbitrary code.
Any confusion like this — even if the system in question is “behaving as
intended”, and can’t readily be changed — is a vulnerability that an attacker
can exploit.
System administrators and developers are high-value targets in the world of
cybercrime. If you hack a user, you get that user’s data; but if you hack an
admin or a dev, and you do it right, you could get access to thousands of users
whose systems are under the administrator’s control or even millions of users
who use the developers’ software.
Therefore, while “just be more careful all the time” is not a sustainable
recipe for safety, to some extent, those
of us acting on our users’ behalf do have a greater obligation to be more
careful. At least, we should be informed about the behavior of our tools.
Developer tools, like Python, are inevitably power tools which may require more
care and precision than the average application.
Nothing I’ve described above is a “bug” or an “exploit”, exactly; I don’t think
that the developers of Python or Jupyter have done anything wrong; the system
works the way it’s designed and the way it’s designed makes sense. I
personally do not have any great ideas for how things could be changed without
removing a ton of power from Python.
One of my favorite safety inventions is the
SawStop. Nothing was
wrong with the way table saws worked before its invention; they were
extremely dangerous tools that performed an important industrial function. A
lot of very useful and important things were made with table saws. Yet, it was
also true that table saws were responsible for a disproportionate share of
wood-shop accidents, and, in particular, lost fingers. Despite plenty of care
taken by experienced and safety-conscious carpenters, the SawStop still saves
many fingers every year.
So by highlighting this potential danger I also hope to provoke some thinking
among some enterprising security engineers out there. What might be the
SawStop of arbitrary code execution for interactive interpreters? What
invention might be able to prevent some of the scenarios I describe below
without significantly diminishing the power of tools like Python?
Stay safe out there, friends.
Acknowledgments
Thanks very much to Paul Ganssle, Nathaniel
J. Smith, Itamar Turner-Trauring
and Nelson Elhage for substantial feedback on earlier
drafts of this post.
Any errors remain my own.