A few years back, I published The Elements of Python Style, a popular Python code style guide. Since publishing it, friends of mine in the Python community have wondered if I might consider adding a section about package installation, dependency management, and other similar “standard tooling” recommendations.
This is a reasonable request, since Python lacks much in the way of standard tooling. I took a stab at this in a pull request here, but then abandoned my attempt. The length of this “section” started to approach the length of the overall style guide itself! So, I gave up on that. I decided to turn the section into this blog post here, instead. Then, I’ll share one thought about how emerging programming language communities, such as Zig’s, could learn from the Python experience.
On “Standard” Tools
There’s a zoo of tooling options out there, and no “standard” Python tooling beyond the
python executable, and, perhaps,
pip (for installing packages, which was semi-formalized in Python 3.x with PEP 453 and PEP 508). Here, we’ll discuss an opinionated (yet uncontroversial) approach to “standard” tooling with Python.
Build and deploy
It’s generally unnecessary to use any sort of “build tool” with Python, since development usually involves running
python commands against your source tree directly. However, as will be described below, it is very common to include a
pyproject.toml file in your source root as an entry point for building packages from your source code.
That said, you’ll often find very simple and minimalist GNU
make files (named
Makefile) in use on Python projects. These are usually small files that simply list commands for finding dependencies, packaging, linting, testing, and so on.
I’d recommend against coupling your Python project to one of the myriad generic build tools out there, like Scons or Bazel. That is, if it can be avoided.
When you need to automate deployment for your Python project (e.g. for web applications), you’re likely going to adopt something like Fabric or Ansible, which offers some tooling akin to
make, but with the added capabilities of a full-blown Python API and the ability to manage remote servers via SSH.
Linting and formatting
flake8 is a good choice for linting, as it combines
pyflakes and that’s usually all you need on the linting side.
Because operating systems differ dramatically in what version of Python they run, and how they manage Python dependencies, you’ll very likely find yourself in need of an environment manager for your Python code.
For local development & local dependency environments,
pyenv is the gold standard. When used together with its included
pyenv-virtualenv plugin, it is very future-proof and solid.
This is because
pyenv can manage plain CPython installations, both future ones and historical ones; it can manage Conda environments via
miniconda; it lets you run Python 2 and Python 3 side-by-side; it even supports PyPy, for situations where you need that; and, via
pyenv-virtualenv, it lets you layer “virtual environments” over your installed Python versions. This lets you isolate dependencies between your several Python projects. It’s also a good choice for simple Python “environment-based” deployments to remote servers. You can read this detailed StackOverflow answer on why this is a solid choice.
For installing dependencies, you’ll want to avoid the debates going on in the community related to
poetry, and stick with
Most Pythonistas, upon cloning a Python project, instinctively look for a
requirements.txt file so that they can run the incantation
pip install -r requirements.txt to fetch your dependencies.
You can also layer on
pip-tools if you need version pinning; it is being actively maintained, and its logic is even re-used by some other dependency manager projects. It has some great docs with usage instructions.
setup.py file (or, more recently, a
pyproject.toml file) is a good idea if you’re publishing your library as a formal dependency to a private or public PyPI server.
Yes, it involves some boilerplate to set up initially, but it’s generally a “set-it-and-forget-it” thing. Don’t overthink it. The Python Packaging Authority (PyPA) has a nice packaging tutorial that covers this ground.
Some historical commentary on packaging in Python
There’s quite a lot of history behind Python’s packaging options — spanning dependency management, resolution, code packaging, and deployment. This makes sense, given that packaging was never taken up by the Python core team, and thus developed in the open among many open source communities. This has, however, provided a confusing message to new users about “what is standard”.
Python started with
easy_install, later added
pip, which definitely improved over
easy_install. But then, later, people realized pinning was useful for the way Python was deployed, so someone built
pip-tools. Around the same time, Anaconda, one of the commercial sponsors of the scientific Python community that often faced dependency hell, worked on
conda, which solved some very important heavyweight dependency management issues “in anger”. Even Guido van Rossum, Python’s creator, once told the Anaconda team that packaging was unlikely to be tackled by the core team, and thus indirectly greenlit the development of
conda as an on-going and active project (which it remains to this day).
In the last couple years, a couple of well-known Python F/OSS folks built
pipenv. They are great projects, but they are, put simply, newer and fancier alternatives to
pip. So, we face a paradox of choice. It’s just the free-wheeling nature of a very open F/OSS community, especially since the Python core team has decided not to “bless” any one or another packaging/installer tool (though they have elevated
pip in Python 3.x, ratified
pyproject.toml for packaging, and encouraged
wheel as a distribution format via PEP 518).
When one really thinks about it, though, the only “schism” in the community is between PyPI and Conda.
For example, if you want to install
pyspark, the Python API for Apache Spark, you’ll find very different results between PyPI and conda-forge. In the case of PyPI, installing
pyspark only installs the Python code necessary to run
import pyspark successfully. But it won’t install the “implied dependencies”, such as Java/JDK, Scala, and the Apache Spark framework itself. In the case of conda-forge, however, installing
pyspark gives you that Python code, as well as the full-blown managed installation of Java/JDK, Scala, and Apache Spark.
So, one other way to think about this is that, typically, PyPI manages “only” Python modules, and serves as the official package index of record for almost all Python packages. Whereas, conda-forge hosts many Python projects (modules, command-line tools, etc.), plus many other Python-affiliated projects, regardless of underlying language or implementation, with a bias toward supporting scientific computing packages especially well. This means
conda will be very convenient if you’re a data scientist, but will likely be overkill for most Python projects. Thus my recommendation to start out with
pip-tools, and PyPI.
Conclusion, and a note on Zig
The truth is, it’d be better for the Python community if dependency management and packaging were handled in the core. Though not having this solved in the core has led to a lot of experimentation in the broader Python open source community (
conda has a SAT solver!
pipenv scans for CVEs!), it has also led to a lot of fragmentation.
This fragmentation is to such an extent that I can’t even make clear recommendations about it in a Python style guide without exploding its word count by a factor of two.
For Python, it’s too late now. We are where we are.
But, other emerging programming languages like Zig can learn from this experience. It’s worth it to solve this problem in the core. This will save fragmentation pain and allow for clear recommendations for beginning programmers. But, it’s also important to solve it well in the core, as the world’s experience with Node.js,
.node_modules, and super-deep dependency trees has made clear.
Perhaps by combining the ideas of “dependency rejection” common in the C community with the spirit of open sharing that pervades the Python community (via PyPI), a new programming language like Zig can carve a new path that prevents community-wide tooling fragmentation, while still enabling community-wide sharing and code re-use. This is a magic moment to tackle the dependency and packaging tooling problem from scratch. Zig has already done so well on its built-in tooling for building, testing, and formatting code. It now has a great opportunity to get tooling for dependencies and packaging right, as well.
Note: As a helpful recap, here is a quick list of the Python dependency and package management tools discussed in this post. The recommended “uncontroversial” (perhaps, “minimalist”) tools: pip; pyenv; pyenv-virtualenv; pip-tools. There are also fancier tools available in the wider community, but these are not recommended for those starting out, and I referenced them only to illustrate Python’s tooling fragmentation in the dependency/packaging space: conda; miniconda; Pipenv; Poetry.