How Python programmers can uncontroversially approach build, dependency, and packaging tooling (+ a note on Zig)

A few years back, I published The Elements of Python Style, a popular Python code style guide. Since publishing it, friends of mine in the Python community have wondered if I might consider adding a section about package installation, dependency management, and other similar “standard tooling” recommendations.

This is a reasonable request, since Python lacks much in the way of standard tooling. I took a stab at this in a pull request here, but then abandoned my attempt. The length of this “section” started to approach the length of the overall style guide itself! So, I gave up on that. I decided to turn the section into this blog post here, instead. Then, I’ll share one thought about how emerging programming language communities, such as Zig’s, could learn from the Python experience.

On “Standard” Tools

There’s a zoo of tooling options out there, and no “standard” Python tooling beyond the python executable, and, perhaps, pip (for installing packages, which was semi-formalized in Python 3.x with PEP 453 and PEP 508). Here, we’ll discuss an opinionated (yet uncontroversial) approach to “standard” tooling with Python.

Build and deploy

It’s generally unnecessary to use any sort of “build tool” with Python, since development usually involves running python commands against your source tree directly. However, as will be described below, it is very common to include a setup.py or pyproject.toml file in your source root as an entry point for building packages from your source code.

That said, you’ll often find very simple and minimalist GNU make files (named Makefile) in use on Python projects. These are usually small files that simply list commands for finding dependencies, packaging, linting, testing, and so on.

I’d recommend against coupling your Python project to one of the myriad generic build tools out there, like Scons or Bazel. That is, if it can be avoided.

When you need to automate deployment for your Python project (e.g. for web applications), you’re likely going to adopt something like Fabric or Ansible, which offers some tooling akin to make, but with the added capabilities of a full-blown Python API and the ability to manage remote servers via SSH.

Linting and formatting

flake8 is a good choice for linting, as it combines pep8 with pyflakes and that’s usually all you need on the linting side.

It’s very common in the community to use the black formatter, which is similar in principle to the Golang code formatter. But this is optional.

Testing

The pytest framework is totally fine and very popular, but you also won’t get any odd looks for sticking with unittest, and especially doctest, in the stdlib.

For property-based testing (aka “Quickcheck”-style testing), you can layer on hypothesis. For code coverage statistics on your tests, you can layer on pytest-cov.

Environments

Because operating systems differ dramatically in what version of Python they run, and how they manage Python dependencies, you’ll very likely find yourself in need of an environment manager for your Python code.

For local development & local dependency environments, pyenv is the gold standard. When used together with its included pyenv-virtualenv plugin, it is very future-proof and solid.

This is because pyenv can manage plain CPython installations, both future ones and historical ones; it can manage Conda environments via miniconda; it lets you run Python 2 and Python 3 side-by-side; it even supports PyPy, for situations where you need that; and, via pyenv-virtualenv, it lets you layer “virtual environments” over your installed Python versions. This lets you isolate dependencies between your several Python projects. It’s also a good choice for simple Python “environment-based” deployments to remote servers. You can read this detailed StackOverflow answer on why this is a solid choice.

Dependencies

For installing dependencies, you’ll want to avoid the debates going on in the community related to pipenv & poetry, and stick with pip.

Most Pythonistas, upon cloning a Python project, instinctively look for a requirements.txt file so that they can run the incantation pip install -r requirements.txt to fetch your dependencies.

You can also layer on pip-tools if you need version pinning; it is being actively maintained, and its logic is even re-used by some other dependency manager projects. It has some great docs with usage instructions.

Packaging

Creating a setup.py file (or, more recently, a pyproject.toml file) is a good idea if you’re publishing your library as a formal dependency to a private or public PyPI server.

Yes, it involves some boilerplate to set up initially, but it’s generally a “set-it-and-forget-it” thing. Don’t overthink it. The Python Packaging Authority (PyPA) has a nice packaging tutorial that covers this ground.

If you need binary distribution, wheel is a good choice; it is supported by the Python community through PEP 427, and the PyPA maintains the wheel project.

Some historical commentary on packaging in Python

There’s quite a lot of history behind Python’s packaging options — spanning dependency management, resolution, code packaging, and deployment. This makes sense, given that packaging was never taken up by the Python core team, and thus developed in the open among many open source communities. This has, however, provided a confusing message to new users about “what is standard”.

Python started with setuptools and easy_install, later added pip, which definitely improved over easy_install. But then, later, people realized pinning was useful for the way Python was deployed, so someone built pip-tools. Around the same time, Anaconda, one of the commercial sponsors of the scientific Python community that often faced dependency hell, worked on conda, which solved some very important heavyweight dependency management issues “in anger”. Even Guido van Rossum, Python’s creator, once told the Anaconda team that packaging was unlikely to be tackled by the core team, and thus indirectly greenlit the development of conda as an on-going and active project (which it remains to this day).

In the last couple years, a couple of well-known Python F/OSS folks built poetry and pipenv. They are great projects, but they are, put simply, newer and fancier alternatives to pip. So, we face a paradox of choice. It’s just the free-wheeling nature of a very open F/OSS community, especially since the Python core team has decided not to “bless” any one or another packaging/installer tool (though they have elevated pip in Python 3.x, ratified pyproject.toml for packaging, and encouraged wheel as a distribution format via PEP 518).

When one really thinks about it, though, the only “schism” in the community is between PyPI and Conda.

PyPI definitively holds the equivalent role in the Python community that npmjs.com does in JavaScript or that maven.org does in Java. Conda, on the other hand, is an “alternative packaging ecosystem” that is trying to focus on more complex setup and deployment scenarios, especially those in data science or scientific computing.

For example, if you want to install pyspark, the Python API for Apache Spark, you’ll find very different results between PyPI and conda-forge. In the case of PyPI, installing pyspark only installs the Python code necessary to run import pyspark successfully. But it won’t install the “implied dependencies”, such as Java/JDK, Scala, and the Apache Spark framework itself. In the case of conda-forge, however, installing pyspark gives you that Python code, as well as the full-blown managed installation of Java/JDK, Scala, and Apache Spark.

So, one other way to think about this is that, typically, PyPI manages “only” Python modules, and serves as the official package index of record for almost all Python packages. Whereas, conda-forge hosts many Python projects (modules, command-line tools, etc.), plus many other Python-affiliated projects, regardless of underlying language or implementation, with a bias toward supporting scientific computing packages especially well. This means conda will be very convenient if you’re a data scientist, but will likely be overkill for most Python projects. Thus my recommendation to start out with pip, pip-tools, and PyPI.

Conclusion, and a note on Zig

The truth is, it’d be better for the Python community if dependency management and packaging were handled in the core. Though not having this solved in the core has led to a lot of experimentation in the broader Python open source community (conda has a SAT solver! pipenv scans for CVEs!), it has also led to a lot of fragmentation.

This fragmentation is to such an extent that I can’t even make clear recommendations about it in a Python style guide without exploding its word count by a factor of two.

For Python, it’s too late now. We are where we are.

But, other emerging programming languages like Zig can learn from this experience. It’s worth it to solve this problem in the core. This will save fragmentation pain and allow for clear recommendations for beginning programmers. But, it’s also important to solve it well in the core, as the world’s experience with Node.js, npm, left-pad, .node_modules, and super-deep dependency trees has made clear.

Perhaps by combining the ideas of “dependency rejection” common in the C community with the spirit of open sharing that pervades the Python community (via PyPI), a new programming language like Zig can carve a new path that prevents community-wide tooling fragmentation, while still enabling community-wide sharing and code re-use. This is a magic moment to tackle the dependency and packaging tooling problem from scratch. Zig has already done so well on its built-in tooling for building, testing, and formatting code. It now has a great opportunity to get tooling for dependencies and packaging right, as well.


Note: As a helpful recap, here is a quick list of the Python dependency and package management tools discussed in this post. The recommended “uncontroversial” (perhaps, “minimalist”) tools: pip; pyenv; pyenv-virtualenv; pip-tools. There are also fancier tools available in the wider community, but these are not recommended for those starting out, and I referenced them only to illustrate Python’s tooling fragmentation in the dependency/packaging space: conda; miniconda; Pipenv; Poetry.

3 thoughts on “How Python programmers can uncontroversially approach build, dependency, and packaging tooling (+ a note on Zig)”

  1. Pingback: The Blog Chill

Leave a Reply

Your email address will not be published. Required fields are marked *