When I describe my programming background these days, I say that I code “primarily in Python, JavaScript, Clojure, C… and Zig!” I put Python first in that list for good reason.
This is a post about the core Python language, but also the ways in which Python is evolving its single-core and multi-core CPU performance.
Python has been my go-to programming tool for a long time. When I started to build out my last company and shipped the production core of its product, Python 2.7 had just stabilized, creating an excellent “core language.” This is a language that I truly respected, as evidenced by my style guide. And, as I discussed in my Python technical book review round-up, this core language was best described by David Beazley in the first half of his Python Essential Reference book, which was also turned into an excellent standalone volume (which includes Python 3.x coverage), Python Distilled.
Many, many useful open source projects, companies, and projects were built atop that Python 2.7 core foundation of a language. Its community truly flourished.
But in the early days of my first company, the Python core team was in full swing with Python3000, improving the language in a number of directions, which would eventually become the main Python 3.x releases we all use and love today. (See also Python 3 is here and the sky is not falling.)
With the benefit of hindsight, it’s clear that only a few Python 3.x features truly took the language in a different direction — and always in an add-on-to-the-core sort of way. For example: gradual typing and asyncio. These features were big and important. But, they didn’t fundamentally change the kinds of programs you could write in Python. They instead improved certain approaches to software craftsmanship — allowing for the documentation of large codebases (with type hints) and allowing for a simpler approach to asynchronous I/O (with event loops).
Some Python 3.x features felt like they should have always been there in Python 2.7, such as enum
and dataclasses
(we survived with namedtuple
). These new data types allowed for a change in the mental model of how to describe the data you work with day-to-day in a Python program. And that really matters for language ergonomics. Some Python 3 features still feel a bit more experimental. For example: structural pattern matching and exception groups may feel a little awkward at first. But I do think they will start to click for Python programmers over time.
As for everyday conveniences, one can thank the Python core team profusely for f-strings — yes, we survived with the ugly %
formatter operator, but f-strings are much more flexible and readable, especially with proper syntax highlighters in editors.
Other new features felt more like a “clean-up” of learnings from years of production Python use. For example, the stable C ABI. Super important for C extension authors, but mostly hidden from everyday Python users.
Perhaps unsurprisingly, the Python 3 developments that have generated the most buzz in the community are all centered around performance. Python long had a reputation as a “slow” language. This reputation never bothered me: I knew from experience that programmer time was expensive and CPU time was cheap and getting cheaper all the time. Python 2.7 was a language that fit very nicely in the limited capacity of a programmer brain, but suffered a bit in terms of CPU execution speed, especially compared to its competitors. This mostly stemmed from Python’s focus on simplicity. Its interpreter did not use the fancy (and largely inscrutable) JIT compilation techniques popular in the Java, C++, and JavaScript compiler engineering world. Python’s language design insisted on keeping a highly dynamic runtime, which though a good fit for the flexible associative agility of programmer cognition, was deeply at odds with machine optimization techniques.
What’s more, Python stayed a single-core language due to its Global Interpreter Lock (aka GIL), which was a perpetual source of disappointment for those interested in multi-core parallel programming. Lots of hemming and hawing about Go and Rust comparisons, while the Python programmers I knew just plowed ahead with multiprocessing and similar tools, despite their rough edges.
We now have a lot of performance improvements being actively developed for Python. These span a number of important PEPs and projects, such as:
- PEP 554: Multiple Interpreters in the Stdlib (accepted)
- PEP 659: Specializing Adaptive Interpreter (draft)
- PEP 684: A Per-Interpreter GIL (accepted)
- Guido’s Faster CPython team
- The working nogil implementation
- PEP 703: Making the Global Interpreter Lock Optional in CPython (accepted)
Broadly, there are four performance approaches being explored, some of which are already successful:
- Make CPython Faster at single-core performance. This is what Guido’s team is working on with the “Faster CPython” project. The idea is to focus on optimizing the C codebase that is CPython, to get speedups that affect everyday code. You then upgrade your Python interpreter (from, say, 3.13 to 3.14 in celebration of Pi!) and you get speedups in working code “for free.” There is also some exciting and promising recent work on a copy-and-patch JIT compiler that would run within the CPython interpreter.
- Make CPython support a “nogil” mode. This is what PEP 703 is all about. Since there is now support for a GIL-free Python, there is the option to write multi-threaded, shared memory Python code in such a way that it utilizes multiple cores on a single machine.
- Make CPython support multiple Python interpreters running in the same process, aka subinterpreters. This is what PEP 684 is about. This is halfway between
multiprocessing
andnogil
. Withmultiprocessing
, you run separate Python interpreters as separate processes, and create an operating-system-managed communication channel between them (e.g. afork()
copy-on-write memory space, a UNIX pipe, a file, or a/dev/shm
shared buffer). With subinterpreters, you could run two entirely parallel Python interpreters within the same process, and this might allow easier cross-platform in-process shared memory approaches vs operating-system-specific approaches. - Make CPython easier to run across multiple nodes. This is a hard problem attacked at the community level by projects like
pyspark
,ray
, anddask
. The CPython core team is staying away from this one.
The core Python language remains a mental treat for programmers. The single-core performance of Python is getting faster. The option to go “natively” multi-core with Python is also becoming increasingly viable. The future of Python — light on your brain, fast on your machine — is bright!
Acknowledgements: Thank you to Rom for reviewing a draft of this essay and sharing some pointers to recent Python performance projects. Also, thanks to Cody Hiar for sending me some feedback on the post and reminding me that a few years back, I gave a relevant PyData NYC talk that covers some of these GIL and multi-core and multi-node processing challenges. That’s this one on YouTube: “Beating Python’s GIL to Max Out Your CPUs.” You can also skim the slides here.
One thought on “Core Python”