Clojonic: Pythonic Clojure

Andrew Montalenti, CTO

Pythonic iteration

nums = [45, 23, 51, 32, 5]
for idx, num in enumerate(nums):
    print idx, num

# 0 45
# 1 23
# 2 51
# 3 32
# 4 5

Clojonic iteration (1)

(let [nums [45 23 51 32 5]]
    (for [[idx num] (map-indexed vector nums)]
        (println idx num)))

; 0 45
; 1 23
; 2 51
; 3 32
; 4 5

Clojonic iteration (2)


Clojonic iteration (3)

(defn enumerate [coll]
    (map-indexed vector coll))

(let [nums [45 23 51 32 5]]
    (for [[idx num] (enumerate nums)]
        (println idx num)))

; 0 45
; 1 23
; 2 51
; 3 32
; 4 5

Clojonic iteration (4)

(defmacro enumerate [coll]
    `(map-indexed vector ~coll))

(let [nums [45 23 51 32 5]]
    (for [[idx num] (enumerate nums)]
        (println idx num)))

; ---

(macroexpand '(enumerate [1 2 3]))

    (clojure.core/map-indexed clojure.core/vector [1 2 3])

 ; Translation; code is *replaced* with:

    (map-indexed vector [1 2 3])

defn... is a macro!

;; expand:

    '(defn named-function [some-value]
        (println some-value)))

;; generated code:

    (def named-function (clojure.core/fn ([some-value]
        (println some-value))))

;; compiler source is something like:

    (defmacro defn [name & fdecl]
        (list 'def name (cons `fn fdecl)))

Sample Python program


import json

def with_twitter_data(rdr_fn):
    with open("data/tweets.log") as rdr:
        return list(rdr_fn(rdr))

def read_tweets(rdr):
    for line in rdr:
        apikey, timestamp, entry = line.split("|", 2)
        yield apikey, timestamp, json.loads(entry)


Clojure "syntax"

;; twitter.clj
(ns twitter
    (:require [ :as json]
              [ :as io]
              [clojure.string :as str]))

(defn with-twitter-data [rdr-fn]
    (with-open [rdr (io/reader "data/tweets.log")]
        (doall (rdr-fn rdr))))

(defn read-tweets [rdr]
    (for [line (line-seq rdr)]
        (let [[apikey timestamp entry] (str/split line #"\|" 3)]
            (vec [apikey timestamp (json/read-str entry)]))))

(with-twitter-data read-tweets)

Quick comparison

Idea Python Clojure
Binding label = val (let [label val])
Unpacking val1, val2 = Destructuring Form
Iteration for (for) macro
Functions def (defn) macro
File Open open() (io/reader)
String Split str.split() (str/split)
JSON Parse json.loads() (json/read-str)
Namespaces Modules (ns) macro
Imports import (ns (:require))
Data Structs {} [] (,) {} [] ()

Clojure unique stuff

Idea Python Clojure
Macros N/A Built-in
Immutable Data Structs N/A Built-in
Keywords N/A Built-in
Lambdas / Blocks Crippled Built-in
DSLs With Metaclasses With Macros
Lazy Eval Opt-in (generators) Opt-out (doall)
Code as Data import ast Lisp forms
Platform Interop via C, Cython, etc. via JVM
Concurrency Trad'l, processes STM Impl
Object Orientation Trad'l, class-based Multi dispatch

Other examples

EMR cluster (lemur)

(defcluster pig-cluster
    :master-instance-type "m1.large"
    :slave-instance-type "m1.large"
    :num-instances 2
    :keypair "emr_jobs"
    :enable-debugging? false
    :bootstrap-action.1 [
        (s3-libs "/pig/pig-script")
        ["--base-path" (s3-libs "/pig/")
        "--install-pig" "--pig-versions" "latest"]
    :runtime-jar (s3-libs "/script-runner/script-runner.jar")

EMR Pig steps (lemur)

(defstep twitter-count-step
    :args.positional [
        (s3-libs "/pig/pig-script")
        "--base-path" (s3-libs "/pig/")
        "--pig-versions" "latest"
        "--run-pig-script" "--args"
        "-f" "s3://pystorm/url_counts.pig"

(fire! pig-cluster twitter-count-step)

Twitter Click Spout (Storm)

        ;; Python Spout implementation:
        ;; - fetches tweets (e.g. from Kafka)
        ;; - emits (urlref, url, ts) tuples
        ["python" ""]
        ;; Stream declaration:
        ["urlref" "url" "ts"]

Twitter Count Bolt (Storm)

        ;; Bolt input: Spout and field grouping on urlref
        {"twitter-click-spout" ["urlref"]}
        ;; Python Bolt implementation:
        ;; - maintains a Counter of urlref
        ;; - increments as new clicks arrive
        ["python" ""]
        ;; Emits latest click count for each tweet as new Stream
        ["twitter_link" "clicks"]
        :p 4

Running local Storm cluster

(defn run-local! []
    (let [cluster (LocalCluster.)]
        ;; submit the topology configured above
        (.submitTopology cluster
                        ;; topology name
                        ;; topology settings
                        {TOPOLOGY-DEBUG true}
                        ;; topology configuration
        ;; sleep for 5 seconds before...
        (Thread/sleep 5000)
        ;; shutting down the cluster
        (.shutdown cluster)

Command line parsing

(defn -main [& args]
    (let [[opts args banner]
        (cli args
            ["-h" "--help" "Show help"
                :flag true :default false]
            ["-v" "--verbose" "Verbose output"
                :flag true :default false]
            ["-s" "--spec" "Storm Topology spec .clj file"]
            ["-j" "--jar" "Storm Topology code .jar file"]
            ["-c" "--config" "Storm Environment config file"
                :default "config.json"]
            ["-d" "--debug" "Enable Storm Topology debugging"
                :default true]
            ["-e" "--env" "Environment, e.g. prod or local"
                :default "local"]
        (println opts args banner)))

SQL table creation

(ns ring-sample.sqlite
    (:require   [ :as jdbc]))

(def db {:classname   "org.sqlite.JDBC"
         :subprotocol "sqlite"
         :subname     "data/database.db"})

(defn make-table! []
        (jdbc/with-connection db
                    [:id :integer]
                    [:apikey :text]
                    [:name :text]
                    [:seen :datetime]))
    (catch Exception exc (println exc))))



lein is like venv, pip, ipython, setuptools, and buildout all combined in one tool.

Through plugins, can also embed build tools that you'd put normally put in a Makefile.

Also includes "project quickstarts", similar to django quickstart and cookiecutter.

lein is written in Clojure, but you also install Clojure itself using lein by declaring a dependency to Clojure in your project.clj file.

project.clj example

;;            project            version
;;               ^                  ^
(defproject parsely-stormtest "0.0.1-SNAPSHOT"
    ;; code locations
    :source-paths ["src/clj"]
    :resource-paths ["multilang", "data"]

    ;; project dependencies
    :dependencies   [[org.apache.storm/storm-core "0.9.1"]
                     [org.clojure/clojure "1.5.1"]
                     [org.clojure/data.json "0.2.4"]
                     [org.clojure/tools.namespace "0.2.4"]]

    ;; invoked by lein run
    :main parsely.stormtest

    ;; lein compile options
    :min-lein-version "2.0.0"
    :aot :all


Functionality How
Just the Clojure REPL java -jar clojure.jar
Clojure REPL w/ dependencies lein repl
CLI entry-point w/ dependencies lein run


Functionality How
Eval code in editor nREPL plugins for vim / emacs / SublimeText
Run-debug loop Repeated lein run calls at CLI, or tests
Interact with code LightTable embeds Clojure really nicely
Code notebook session is a project akin to IPython Notebook


Functionality How
Unit test framework Use clojure.test, then lein test
Test-driven / BDD midje, speclj, expectations
Parametric simulant, test.check
Assertions (is (.startsWith "abcde" "ab"))

Go forth!

Install lein:

Read a little bit more about Clojure: