Open Source

Core Python

When I describe my programming background these days, I say that I code “primarily in Python, JavaScript, Clojure, C… and Zig!” I put Python first in that list for good reason.

This is a post about the core Python language, but also the ways in which Python is evolving its single-core and multi-core CPU performance.

Python has been my go-to programming tool for a long time. When I started to build out my last startup and shipped the production core of its product, Python 2.7 had just stabilized, creating an excellent “core language.” This is a language that I truly respected, as evidenced by my style guide. And, as I discussed in my Python technical book review round-up, this core language was best described by David Beazley in the first half of his Python Essential Reference book, which was also turned into an excellent standalone volume (which includes Python 3.x coverage), Python Distilled.

Many, many useful open source projects, companies, and projects were built atop that Python 2.7 core foundation of a language. Its community truly flourished.

Continue reading Core Python

Dependency rejection

Sam Altman once said: “Minimize your own cognitive load from distracting things that don’t really matter. It’s hard to overstate how important this is, and how bad most are at it. Get rid of distractions in your life. Develop very strong ways to avoid letting crap pile up.”

In programming, there is a technique called “dependency injection.” It’s a way of worrying, up front, about how to split the modules in your program from other code. You aim to give yourself the future benefit of being able to swap out a module dependency later.

In some communities, this little technique led to a temporary fad of constructing programs within a baroque superstructure via “dependency injection frameworks,” sometimes called “inversion of control frameworks.” With these, programmers fretted about a future that usually never arrived, and their worries expressed themselves as defensive code, all in the name of future-proofing. This meant in addition to spending time adopting a dependency, they also had a meta-distraction: working that dependency into their “management framework,” alongside the others. This sort of thing took some programmers very far away from the core “essential complexity” at the heart of their code.¹

Dependencies seem to be all around us, both in the real world, and in programming. And they are perniciously distracting in just this way. Have you ever noticed how rare it is for you to just do something?

If so, you might have been worrying, up front, about dependencies.²

In the back of your mind, you wonder: has this been solved? The world is full of solutions in search of problems, and the internet is always nearby. You convince yourself to research if some of these happen to apply to your problem.

Even if you don’t find something off-the-shelf, you still might think to yourself: I could just do this, but is this a good use of my time? Lacking a ready-made solution, you then turn to delegation: whom shall I ask to solve this? After all, the market usually obliges with eager participants willing to sell their labor for a price.

But whether your proposed solution is an object or a person (and whether it is free or paid), there’s one thing it always is: a dependency.

Continue reading Dependency rejection

How Python programmers can uncontroversially approach build, dependency, and packaging tooling (+ a note on Zig)

A few years back, I published The Elements of Python Style, a popular Python code style guide. Since publishing it, friends of mine in the Python community have wondered if I might consider adding a section about package installation, dependency management, and other similar “standard tooling” recommendations.

This is a reasonable request, since Python lacks much in the way of standard tooling. I took a stab at this in a pull request here, but then abandoned my attempt. The length of this “section” started to approach the length of the overall style guide itself! So, I gave up on that. I decided to turn the section into this blog post here, instead. Then, I’ll share one thought about how emerging programming language communities, such as Zig’s, could learn from the Python experience.

On “Standard” Tools

There’s a zoo of tooling options out there, and no “standard” Python tooling beyond the python executable, and, perhaps, pip (for installing packages, which was semi-formalized in Python 3.x with PEP 453 and PEP 508). Here, we’ll discuss an opinionated (yet uncontroversial) approach to “standard” tooling with Python.

Continue reading How Python programmers can uncontroversially approach build, dependency, and packaging tooling (+ a note on Zig)

Turning n/2 + 1

When I turned 27, I wrote the following in my birthday post:

I don’t need stuff. I just need time. Of course, that’s the bittersweet part of one’s birthday. That even as you come to realize the importance of time, the day acts as a reminder of how our time on this earth is limited. 1 day passes, and only n-1 left to make a difference.

The average life expectancy for a US male born in 1984 is 75. I just turned 38 today. Therefore, it’s fair to say, I just turned n/2 + 1.

That is perhaps a bit too fatalistic and reductive. The number n is not guaranteed to be 75. “Don’t be so morbid!” someone might exclaim to me. “After all, many people live to 80, 90, even 100. And medicine improves all the time.”

Well, yes, this is true. But, it’s also likely — and increasingly so — that I might die any minute. Freak accidents, a late-discovered birth defect. Or, just losing the medical lottery in middle age. So, I return to the wisdom of my youth: “I don’t need stuff. I just need time.”

But, toward what end? That has been the interesting riddle of approaching n/2 with the following undeniable privileges:

good health
professional satisfaction
financial security
confidence in my irreversible life choices

Many approach this same milestone with none of the above, and many would love for any one of them to be squared away.

The honest truth is, I find myself heavy with the weight of these privileges. History is, as Harold Bloom once put it when describing literature, “a conflict between past genius and present aspiration, in which the prize is […] survival.” Here, he was referring to well-crafted stories. “Survival” meant their perpetuation through the ages via timeless literary relevance, something he referred to as “canonical inclusion”.

But what of my field, software?

Continue reading Turning n/2 + 1

Parse.ly, Automattic: the long view

In 2009, I quit my first programming job after college to work on a startup. That startup eventually became Parse.ly. I’ve written about Parse.ly’s startup beginnings and evolution elsewhere on this blog, including:

It is 2021 now, more than 12 years since the company’s original founding. And much has changed.

Parse.ly “the startup” was a rollercoaster, like all startups are, but it was, ultimately, a success. In 2009, we were a tiny 3-person team hacking away on prototypes at a startup accelarator in Philadelphia. In 2012, Parse.ly had its first handful of customers for the content analytics system that became our core product, and shifted into enterprise SaaS as a business model. In 2013, we raised “Series A” style financing to pursue the ambition of defining and leading the content analytics category.

By 2017, it was clear that Parse.ly had done just that: we had built a valuable product in the market and a beautiful SaaS business model, where our R&D efforts were aligned with our customer needs. There was only a question of total market size. As a result, in 2018 we shifted our efforts to expanding Parse.ly’s market — acting as a content measurement layer not just for major media and entertainment companies, but for all content management systems and all content-rich websites.

By the end of 2019 and heading into early 2020, it was clear that Parse.ly was succeeding in this new vision, and was going to be a SaaS company “in it for the long-term”, serving our customers for years to come.

Continue reading Parse.ly, Automattic: the long view

Learning about babashka (bb), a minimalist Clojure for building CLI tools

A few years back, I wrote Clojonic: Pythonic Clojure, which compares Clojure to Python, and concluded:

My exploration of Clojure so far has made me realize that the languages share surprisingly more in common than I originally thought as an outside observer. Indeed, I think Clojure may be the most “Pythonic” language running on the JVM today (short of Jython, of course).

That said, as that article discussed, Clojure is a very different language than Python. As Rich Hickey, the creator of Clojure, put it in his “A History of Clojure”:

Most developers come to Clojure from Java, JavaScript, Python, Ruby and other OO languages. [… T]he most significant […] problem [in adopting Clojure] is learning functional programming. Clojure is not multiparadigm, it is FP or nothing. None of the imperative techniques they are used to are available. That said, the language is small and the data structure set evident. Clojure has a reputation for being opinionated, opinionated languages being those that somewhat force a particular development style or strategy, which I will graciously accept as meaning the idioms are clear, and somewhat inescapable.

There is one area in which Clojure and Python seem to have a gulf between them, for a seemingly minor (but, in practice, major) technical reason. Clojure, being a JVM language, inherits the JVM’s slow start-up time, especially for short-lived scripts, as is common for UNIX CLI tools and scripts.

As a result, though Clojure is a relatively popular general purpose programming language — and, indeed, one of the most popular dynamic functional programming languages in existence — it is still notably unpopular for writing quick scripts and commonly-used CLI tools. But, in theory, this needn’t be the case!

Continue reading Learning about babashka (bb), a minimalist Clojure for building CLI tools

Python 3 is here and the sky is not falling

James Bennett, a long-time Python developer, blogger, and contributor to Django, recently wrote a nice post about the “end” of Python 2.x, entitled “Variations on the Death of Python 2.” It’s a great read for anyone who, like me, has been in the Python community a long time.

I’ve been a Python user since the early 2.x days, first discovering Python in a print copy of Linux Journal in the year 2000, where a well-known open source developer and advocate described his transition from Perl to Python. He wrote:

I was generating working code nearly as fast as I could type. When I realized this, I was quite startled.

An important measure of effort in coding is the frequency with which you write something that doesn’t actually match your mental representation of the problem, and have to backtrack on realizing that what you just typed won’t actually tell the language to do what you’re thinking. An important measure of good language design is how rapidly the percentage of missteps of this kind falls as you gain experience with the language. When you’re writing working code nearly as fast as you can type and your misstep rate is near zero, it generally means you’ve achieved mastery of the language.

But that didn’t make sense, because it was still day one and I was regularly pausing to look up new language and library features!

This was my first clue that, in Python, I was actually dealing with an exceptionally good design.

Python’s wonderful design as a language has always been a source of inspiration for me. I even wrote “The Elements of Python Style”, as an ode to how good Python code, to me, felt like good written prose. And, of course, many of my personal and professional projects are proudly Python Powered.

python-powered

Thus, I was always a little worried about the Python 2 to 3 transition. I was concerned that this one big risk, taken on by the core team, could imperil the entire language, and thus the entire community. Perl 5 had embarked on a language schism toward Perl 6 (now Raku), and many believe that both communities (Perl 5 and Raku) became weaker as a result.

But, here we are in 2020, and Python 2 is EOL, and Python 3 is here to stay. A lot of the internet debates about Python 2 vs Python 3 (like this flame war on lobste.rs) now seem to boil down to this question: was Python 3 a good idea, in retrospect?

Continue reading Python 3 is here and the sky is not falling

JavaScript: The Modern Parts

In the last few months, I have learned a lot about modern JavaScript and CSS development with a local toolchain powered by Node 8, Webpack 4, and Babel 7. As part of that, I am doing my second “re-introduction to JavaScript”. I first learned JS in 1998. Then relearned it from scratch in 2008, in the era of “The Good Parts”, Firebug, jQuery, IE6-compatibility, and eventually the then-fledgling Node ecosystem. In that era, I wrote one of the most widely deployed pieces of JavaScript on the web, and maintained a system powered by it.

Now I am re-learning it in the era of ECMAScript (ES6 / ES2017), transpilation, formal support for libraries and modularization, and, mobile web performance with things like PWAs, code splitting, and WebWorkers / ServiceWorkers. I am also pleasantly surprised that JS, via the ECMAScript standard and Babel, has evolved into a pretty good programming language, all things considered.

To solidify all this stuff, I am using webpack/babel to build all static assets for a simple Python/Flask web app, which ends up deployed as a multi-hundred-page static site.

One weekend, I ported everything from Flask-Assets to webpack, and to play around with ES2017 features, as well as explore the Sass CSS preprocessor and some D3.js examples. And boy, did that send me down a yak shaving rabbit hole. Let’s start from the beginning!

JavaScript in 1998

I first learned JavaScript in 1998. It’s hard to believe that this was 20 years — two decades! — ago. This post will chart the two decades since — covering JavaScript in 1998, 2008, and 2018. The focus of the article will be on “modern” JavaScript, as of my understanding in 2018/2019, and, in particular, what a non-JavaScript programmer should know about how the language — and its associated tooling and runtime — have dramatically evolved. If you’re the kind of programmer who thinks, “I code in Python/Java/Ruby/C/whatever, and thus I have no use for JavaScript and don’t need to know anything about it”, you’re wrong, and I’ll describe why. Incidentally, you were right in 1998, you could get by without it in 2008, and you are dead wrong in 2018.

Further, if you are the kind of programmer who thinks, “JavaScript is a tire fire I’d rather avoid because it lacks basic infrastructure we take for granted in ‘real’ programming languages”, then you are also wrong. I’ll be able to show you how “not taking JavaScript seriously” is the 2018 equivalent of the skeptical 2008-era programmer not taking Python or Ruby seriously. JavaScript is a language that is not only here to stay, but has already — and will continue to — take over the world in several important areas. To be a serious programmer, you’ll have to know JavaScript’s Modern and Good Parts — as well as some other server-side language, like Python, Ruby, Go, Elixir, Clojure, Java, and so on. But, though you can swap one backend language for the other, you can’t avoid JavaScript: it’s pervasive in every kind of web deployment scenario. And, the developer tooling has fully caught up to your expectations.

Continue reading JavaScript: The Modern Parts

Shipping the Second System

In 2015-2016, the Parse.ly team embarked upon the task of re-envisioning its entire backend technology stack. The goal was to build upon the learnings of more than 2 years delivering real-time web content analytics, and use that knowledge to create the foundation for a scalable stream processing system that had built-in support for fault tolerance, data consistency, and query flexibility. Today in 2019, we’ve been running this new system successfully in production for over 2 years. Here’s what we learned about designing, building, shipping, and scaling the mythical “second system”.

The Second System Effect

But why re-design our existing system? This question lingered in our minds a few years back. After all, the first system was successful. And I had the lessons of Frederick Brooks accessible and nearby when I embarked on this project. He wrote in The Mythical Man-Month:

Sooner or later the first system is finished, and the architect, with firm confidence and a demonstrated mastery of that class of systems, is ready to build a second system.

This second is the most dangerous system a man ever designs.

When he does his third and later ones, his prior experiences will confirm each other as to the general characteristics of such systems, and their differences will identify those parts of his experience that are particular and not generalizable.

The general tendency is to over-design the second system, using all the ideas and frills that were cautiously sidetracked on the first one. The result, as Ovid says, is a “big pile.”

Were we suffering from engineering hubris to redesign a working system? Perhaps. But we may have been suffering from something else altogether healthy — the paranoia of a high-growth software startup.

I discuss Parse.ly’s log-oriented architecture at Facebook’s HQ for PyData Silicon Valley, with Parse.ly’s VP of Engineering, Keith Bourgoin.

Our product had only just been commercialized. We were a team small enough to be nimble, but large enough to be dangerous. Yes, there were only a handful of engineers. But we were operating at the scale of billions of analytics events per day, on-track to serve hundreds of enterprise customers who required low-latency analytics over terabytes of production data. We knew that scale was not just a “temporary problem”. It was going to be the problem. It was going to be relentless.

Continue reading Shipping the Second System