I previously wrote a post about shipping a PyGame app to users on macOS. It’s now substantially updated for the new Notarization requirements in Catalina. I hope it’s useful to somebody!
Notarize your Python apps for macOS Catalina.
PyPI credentials are important. Here are some tips for securing them a little better.
Too Many Secrets
A wise man once said, “you shouldn’t use ENV variables for secret
In large part, he was right, for all the reasons he gives (and you should read
them). Filesystem locations are usually a better operating system interface to
communicate secrets than environment variables; fewer things can intercept an
open() than can read your process’s command-line or calling environment.
One might say that files are “more secure” than environment variables. To his credit, Diogo doesn’t, for good reason: one shouldn’t refer to the superiority of such a mechanism as being “more secure” in general, but rather, as better for a specific reason in some specific circumstance.
Supplying your PyPI password to tools you run on your personal machine is a very different case than providing a cryptographic key to a containerized application in a remote datacenter. In this case, based on the constraints of the software presently available, I believe an environment variable provides better security, if you use it correctly.
Popping A Shell By Any Other Name
If you upload packages to the python package index, and
people use those packages, your PyPI password is an extremely high-privilege
credential: effectively, it grants a time-delayed arbitrary code execution
privilege on all of the systems where anyone might
pip install your packages.
Unfortunately, the suggested mechanism to manage this crucial, potentially world-destroying credential is to just stick it in an unencrypted file.
The authors of this documentation know this is a problem; the authors of the tooling know too (and, given that these tools are all open source and we all could have fixed them to be better about this, we should all feel bad).
Leaving the secret lying around on the filesystem is a form of ambient authority; a permission you always have, but only sometimes want. One of the worst things about this is that you can easily forget it’s there if you don’t use these credentials very often.
The keyring is a much better place, but even it can be a slightly scary place to put such a thing, because it’s still easy to put it into a state where some random command could upload a PyPI release without prompting you. PyPI is forever, so we want to measure twice and cut once.
Luckily, even more secure places exist: password managers. If you use
https://1password.com or https://www.lastpass.com, both offer command-line
interfaces that integrate nicely with PyPI. If you use 1password, you’ll
really want https://stedolan.github.io/jq/ (
apt-get install jq,
jq) to slice & dice its command-line.
The way that I manage my PyPI credentials is that I never put them on my filesystem, or even into my keyring; instead, I leave them in my password manager, and very briefly toss them into the tools that need them via an environment variable.
First, I have the following shell function, to prevent any mistakes:
1 2 3 4
1 2 3 4 5 6
This way I can debug Twine, my
setup.py, and various test-upload things
without ever needing real credentials at all.
But, OK. Eventually, I need to actually get the credentials and do the thing. How does that work?
1password’s command line is a little tricky to log in to (you have to
its output, it’s not just a command), so here’s a handy shell function that
will do it.
1 2 3 4 5 6
Then, I have this little helper for slicing out a particular field from the OP JSON structure:
1 2 3
And finally, I use this to grab the item I want (named, memorably enough, “PyPI”) and invoke Twine:
1 2 3 4 5 6 7
For lastpass, you can just log in (for all shells; it’s a little less secure)
lpass login; if you’ve logged in before you often don’t even have to do
that, and it will just prompt you when running command that require you to be
logged in; so we don’t need the preamble that 1password’s command line did.
Its version of
prod.twine looks quite similar, but its plaintext output
obviates the need for
1 2 3 4 5
“Keep secrets out of your environment” is generally a good idea, and you should always do it when you can. But, better a moment in your process environment than an eternity on your filesystem. Environment-based configuration can be a very useful stopgap for limiting the lifetimes of credentials when your tools don’t support more sophisticated approaches to secret storage.1
If you are interested in secure secret storage, my micro-project
secretly might be of interest. Right
now it doesn’t do a whole lot; it’s just a small wrapper around the excellent
keyring module and the
pinentry-mac password prompt tools.
secretly presents an interface both for prompting users for their credentials
without requiring the command-line or env vars, and for saving them away in
keychain, for tools that need to pull in an API key and don’t want to make
the user manually edit a config file first.
Really, PyPI should have API keys that last for some short amount of time, that automatically expire so you don’t have to freak out if you gave somebody a 5-year-old laptop and forgot to wipe it first. But again, if I wanted that so bad, I should have implemented it myself... ↩
setup.py is your friend. It’s real sorry about what happened last time.
Okay folks. Time’s up. It’s too late to say that Python’s packaging ecosystem terrible any more. I’m calling it.
Python packaging is not bad any more. If you’re a developer, and you’re trying to create or consume Python libraries, it can be a tractable, even pleasant experience.
I need to say this, because for a long time, Python’s packaging toolchain was … problematic. It isn’t any more, but a lot of people still seem to think that it is, so it’s time to set the record straight.
If you’re not familiar with the history it went something like this:
Python first shipped in an era when adding a dependency meant a veritable Odyssey into cyberspace. First, you’d wait until nobody in your whole family was using the phone line. Then you’d dial your ISP. Once you’d finished fighting your SLIP or PPP client, you’d ask a netnews group if anyone knew of a good gopher site to find a library that could solve your problem. Once you were done with that task, you’d sign off the Internet for the night, and wait about 48 hours too see if anyone responded. If you were lucky enough to get a reply, you’d set up a download at the end of your night’s web-surfing.
pip search it wasn’t.
For the time, Python’s approach to dependency-handling was incredibly
import statement, and the pluggable module import
system, made it easy to get dependencies from wherever made sense.
In Python 2.01, Distutils was introduced. This let Python developers describe their collections of modules abstractly, and added tool support to producing redistributable collections of modules and packages. Again, this was tremendously forward-looking, if somewhat primitive; there was very little to compare it to at the time.
Fast forwarding to 2004;
setuptools was created to address some of the
increasingly-common tasks that open source software maintainers were facing
with distributing their modules over the internet. In 2005, it added
easy_install, in order to provide a tool to automate resolving dependencies
and downloading them into the right locations.
The Dark Age
Unfortunately, in addition to providing basic utilities for expressing
setuptools also dragged in a tremendous amount of complexity.
Its author felt that
import should do something slightly different than what
it does, so installing
setuptools changed it. The main difference between
import was that it facilitated having
multiple different versions of the same library in the same program at the same
time. It turns out that that’s a dumb idea, but in
fairness, it wasn’t entirely clear at the time, and it is certainly useful (and
necessary!) to be able to have multiple versions of a library installed onto a
computer at the same time.
In addition to these idiosyncratic departures from standard Python semantics,
setuptools suffered from being unmaintained. It became a critical part of
the Python ecosystem at the same time as the author was moving on to
other projects entirely outside of programming.
No-one could agree on who the new maintainers should be for a long period of
time. The project was forked, and many
operating systems’ packaging toolchains calcified around a buggy, ancient
From 2008 to 2012 or so, Python packaging was a total mess. It was painful to use. It was not clear which libraries or tools to use, which ones were worth investing in or learning. Doing things the simple way was too tedious, and doing things the automated way involved lots of poorly-documented workarounds and inscrutable failure modes.
This is to say nothing of the fact that there were critical security flaws in various parts of this toolchain. There was no practical way to package and upload Python packages in such a way that users didn’t need a full compiler toolchain for their platform.
To make matters worse for the popular perception of Python’s packaging prowess2, at this same time, newer languages and environments were getting a lot of buzz, ones that had packaging built in at the very beginning and had a much better binary distribution story. These environments learned lessons from the screw-ups of Python and Perl, and really got a lot of things right from the start.
Finally, the Python Package Index, the site which hosts all the open source packages uploaded by the Python community, was basically a proof-of-concept that went live way too early, had almost no operational resources, and was offline all the dang time.
Things were looking pretty bad for Python.
Here is where we get to the point of this post - this is where popular opinion about Python packaging is stuck. Outdated information from this period abounds. Blog posts complaining about problems score high in web searches. Those who used Python during this time, but have now moved on to some other language, frequently scoff and dismiss Python as impossible to package, its packaging ecosystem as broken, PyPI as down all the time, and so on. Worst of all, bad advice for workarounds which are no longer necessary are still easy to find, which causes users to pre-emptively break their environments where they really don’t need to.
From The Ashes
In the midst of all this brokenness, there were some who were heroically,
quietly, slowly fixing the mess, one gnarly bug-report at a time.
started, and its various maintainers fixed much of
overcomplexity and many of its flaws. Donald Stufft
stepped in both on Pip and PyPI and improved the availability of the systems it
depended upon, as well as some
pretty serious vulnerabilities
in the tool itself. Daniel Holth wrote
a PEP for the
which allows for binary redistribution of libraries. In other words, it lets
authors of packages which need a C compiler to build give their users a way to
not have one.
providing a path forward for operating system vendors to start updating their
installations and allowing users to use something modern.
Python Core started distributing the ensurepip module along with both Python 2.7 and 3.3, allowing any user with a recent Python installed to quickly bootstrap into a sensible Python development environment with a one-liner.
A New Renaissance
I won’t give you a full run-down of the state of the packaging art. There’s already a website for that. I will, however, give you a précis of how much easier it is to get started nowadays. Today, if you want to get a sensible, up-to-date python development environment, without administrative privileges, all you have to do is:
1 2 3
Then, for each project you want to do, make a new virtualenv:
1 2 3
From here on out, now the world is your oyster; you can
pip install to your
heart’s content, and
you probably won’t even need to compile any C for
most packages. These instructions don’t depend on Python version, either: as
long as it’s up-to-date, the same steps work on Python 2, Python 3, PyPy and
even Jython. In fact, often the
ensurepip step isn’t even necessary since
pip comes preinstalled. Running it if it’s unnecessary is harmless, even!
Other, more advanced packaging operations are much simpler than they used to be, too.
- Need a C compiler? OS vendors have been working with the open source
community to make this easier across the board:
1 2 3 4 5
$ apt install build-essential python-dev # ubuntu $ xcode-select --install # macOS $ dnf install @development-tools python-devel # fedora C:\> REM windows C:\> start https://www.microsoft.com/en-us/download/details.aspx?id=44266
Okay that last one’s not as obvious as it ought to be but they did at least make it freely available!
Want to upload some stuff to PyPI? This should do it for almost any project:
1 2 3
$ pip install twine $ python setup.py sdist bdist_wheel $ twine upload dist/*
Importantly, PyPI will almost certainly be online. Not only that, but a new, revamped site will be “launching” any day now3.
Again, this isn’t a comprehensive resource; I just want to give you an idea of what’s possible. But, as a deeply experienced Python expert I used to swear at these tools six times a day for years; the most serious Python packaging issue I’ve had this year to date was fixed by cleaning up my git repo to delete a cache file.
Work Still To Do
While the current situation is good, it’s still not great.
Here are just a few of my desiderata:
- We still need better and more universally agreed-upon tooling for end-user deployments.
- Pip should have a GUI frontend so that users can write Python stuff without learning as much command-line arcana.
- There should be tools that help you write and update a
setup.py. Or a
setup.python.jsonor something, so you don’t actually need to write code just to ship some metadata.
- The error messages that you get when you try to build something that needs a C compiler and it doesn’t work should be clearer and more actionable for users who don’t already know what they mean.
- PyPI should automatically build wheels for all platforms by default when you upload sdists; this is a huge project, of course, but it would be super awesome default behavior.
I could go on. There are lots of ways that Python packaging could be better.
The Bottom Line
The real takeaway here though, is that although it’s still not perfect, other languages are no longer doing appreciably better. Go is still working through a number of different options regarding dependency management and vendoring, and, like Python extensions that require C dependencies, CGo is sometimes necessary and always a problem. Node has had its own well-publicized problems with their dependency management culture and package manager. Hackage is cool and all but everything takes a literal geological epoch to compile.
As always, I’m sure none of this applies to Rust and Cargo is basically perfect, but that doesn’t matter, because nobody reading this is actually using Rust.
My point is not that packaging in any of these languages is particularly bad. They’re all actually doing pretty well, especially compared to the state of the general programming ecosystem a few years ago; many of them are making regular progress towards user-facing improvements.
My point is that any commentary suggesting they’re meaningfully better than Python at this point is probably just out of date. Working with Python packaging is more or less fine right now. It could be better, but lots of people are working on improving it, and the structural problems that prevented those improvements from being adopted by the community in a timely manner have almost all been addressed.
Go! Make some
virtualenvs! Hack some
setup.pys! If it’s been a while and
your last experience was really miserable, I promise, it’s better now.
Am I wrong? Did I screw up a detail of your favorite language? Did I forget to mention the one language environment that has a completely perfect, flawless packaging story? Do you feel the need to just yell at a stranger on the Internet about picayune details? Feel free to get in touch!