Data engineers should abstract their code in the most lightweight way possible to facilitate downstream integration in a large-scale data system.
You want lego blocks, not puzzle pieces.
The creators of the C programming language once famously said, “first make it work, then make it right, and, finally, make it fast.” This adage still applies today.
The difference is, we have tools to take working code and validate that it is right against reams of data. Many of these tools can also be used to make the working, right code run really fast across a cluster of machines, possibly even in real-time, as the data comes in.
But, making code work, then right, then fast, requires some discipline.
Continue reading Simple Lego Blocks for Big Data
Let’s say you’ve just joined my team and want to become an idiomatic Python programmer. Where do you begin?
Well, you can move up the learning curve quickly using resources from this blog:
I also have some good resources on web development with Python:
And on more advanced Python concepts, like dunders and functional programming:
Continue reading Idiomatic Python Resources
I read the Bloomberg piece, What Is Code?, an explanation of code artistry and programmer/hacker culture in 2015. I love this paragraph about “languages as liquid infrastructure”:
The point is that things are fluid in the world of programming, fluid in a way that other industries don’t seem to be. Languages are liquid infrastructure. You download a few programs and, whoa, suddenly you have a working Clojure environment. Which is actually the Java Runtime Environment. You grab an old PC that’s outlived its usefulness, put Linux on it, and suddenly you have a powerful Web server. Now you can participate in whole new cultures. There are meetups, gatherings, conferences, blogs, and people chatting on Twitter. And you are welcomed. They are glad for the new blood.
Java was supposed to supplant C and run on smart jewelry. Now it runs application servers, hosts Lisplike languages, and is the core language of the Android operating system. It runs on billions of things. It won. C and C++, which it was designed to supplant, also won. A lot of things keep winning because computers keep getting more plentiful. It’s weird.
Worse is better, is worse, is better, is worse, is better…
Python is the core programming language used at Parse.ly. It also happens to be a quickly-growing language with wide adoption among open source projects. It’s no wonder it’s quickly becoming the leading language for software teams.
I’ve written a couple of blog posts with original material for learning Python, including “import this: learning the Zen of Python with code and slides” and “Build a web app fast”.
Newcomers to Python are often overwhelmed by the wealth of information, available online and in print, for the language. I am often asked by others, “What are the best books for my Python team?” I plan to answer that question with this post, by highlighting what I consider to be the three best Python books on the market today.
Continue reading The 3 Best Python Books for Your Team
I realize now that one of the hardest parts of running a successful startup is “betting” on tech stacks that, 3 years out, will have a groundswell of community support around them.
It’s still shocking to me that when I chose each of the following technologies as a central part of Parse.ly, they were so new/immature as to not even show up on a Google search trends box, but are now very popular technologies.
Continue reading Picking tech stacks
Esko Kilpi wrote:
For most of the developed world, firms, as much as markets, make up the dominant economic pattern. The Internet is nothing less than an extinction-level event for the traditional firm. The Internet, together with technological intelligence, makes it possible to create totally new forms of economic entities… Also very small firms can do things that in the past required very large organizations.
This is true. But for certain small firms that are run as fully distributed teams (as mine, Parse.ly, is) the Internet is an extinction-level event for the physical manifestation of the firm — the office.
Already, companies such as GitHub and Automattic have minimized the importance of co-location in work collaboration. Successful massive creative projects are delivered not just by distributed teams, but also by volunteer teams. For example: the Linux kernel and Wikipedia. I wrote about this in my essay, “Fully Distributed Teams: Are They Viable?”
Continue reading Office extinction for knowledge workers and the rise of fully distributed teams
In 2009, Jack & Russ hacked on an early prototype of SeatGeek for the Dreamit Ventures summer class in Philadelphia. The initial prototype came together in the last two weeks before demo day. I remember that Russ hadn’t shaved in weeks because they were spending every night hacking.
You see, before that, the founding pair knew they wanted to start a company, but they weren’t sure about the idea. They had brainstormed ideas ranging from “WebMD for pets” to “amateur art marketplaces”, finally landing at “Yelp for Bloggers”, an idea they called Scribnia. This got them into Dreamit Ventures.
Continue reading What entrepreneurship really looks like
From a poster at Hacker News commenting on The New Yorker article, Inside the Collapse of The New Republic:
I think I’m exactly the audience that TNR wants. I’m well-educated, make a good living, largely agree with them politically, enjoy long-form journalism, and am familiar with the brand and its history.
Yet I don’t think I would ever subscribe to TNR. I just see a magazine as something that’s going to pile up in my house. I can read more than enough great content online for free. If I was going to subscribe to a magazine, I think that The New Yorker is a lot more interesting than The New Republic.
Take note, journalistas. This is how your readers view your stuff — not as a “public trust”, “a voice”, or “a cause”, as TNR was described by the exiting editors in their resignation letter.
For better or worse, readers view your stuff as a product. And a product, to be bought, let alone used, needs to be useful.
Continue reading The New Republic as a product
Interesting insider Q&A with Paul Sutter, co-founder of Quantcast. Via Hacker News:
Q: What methodical process did you follow for your startup? Did you first test the market using tactics similar to the lean startup approach?
A: Basically, make a list of known problems that you’re well suited to solving, rank them by criteria, fail a lot, bang your head against the wall, and eventually things start to stick.
Continue reading Solving problems with startups
Apache Storm, Kafka, and Spark are gaining a lot of momentum in the data analysis and processing communities. I was curious whether the interest in using these technologies with Python, in particular, is growing. Based on these Google Trends reports, it seems like it is.
Continue reading Web interest in Apache Storm, Kafka, Spark in the Python community