Clojonic: Pythonic Clojure

Andrew Montalenti, CTO

Pythonic iteration

nums = [45, 23, 51, 32, 5]
for idx, num in enumerate(nums):
    print idx, num

# 0 45
# 1 23
# 2 51
# 3 32
# 4 5

Clojonic iteration (1)

(let [nums [45 23 51 32 5]]
    (for [[idx num] (map-indexed vector nums)]
        (println idx num)))

; 0 45
; 1 23
; 2 51
; 3 32
; 4 5

Clojonic iteration (2)

_images/clojure_syntax.png

Clojonic iteration (3)

(defn enumerate [coll]
    (map-indexed vector coll))

(let [nums [45 23 51 32 5]]
    (for [[idx num] (enumerate nums)]
        (println idx num)))

; 0 45
; 1 23
; 2 51
; 3 32
; 4 5

Clojonic iteration (4)

(defmacro enumerate [coll]
    `(map-indexed vector ~coll))

(let [nums [45 23 51 32 5]]
    (for [[idx num] (enumerate nums)]
        (println idx num)))

; ---

(macroexpand '(enumerate [1 2 3]))

    (clojure.core/map-indexed clojure.core/vector [1 2 3])

 ; Translation; code is *replaced* with:

    (map-indexed vector [1 2 3])

defn... is a macro!

;; expand:

(macroexpand
    '(defn named-function [some-value]
        (println some-value)))

;; generated code:

    (def named-function (clojure.core/fn ([some-value]
        (println some-value))))

;; compiler source is something like:

    (defmacro defn [name & fdecl]
        (list 'def name (cons `fn fdecl)))

Sample Python program

"""twitter.py"""

import json

def with_twitter_data(rdr_fn):
    with open("data/tweets.log") as rdr:
        return list(rdr_fn(rdr))

def read_tweets(rdr):
    for line in rdr:
        apikey, timestamp, entry = line.split("|", 2)
        yield apikey, timestamp, json.loads(entry)

with_twitter_data(read_tweets)

Clojure "syntax"

;; twitter.clj
(ns twitter
    (:require [clojure.data.json :as json]
              [clojure.java.io :as io]
              [clojure.string :as str]))

(defn with-twitter-data [rdr-fn]
    (with-open [rdr (io/reader "data/tweets.log")]
        (doall (rdr-fn rdr))))

(defn read-tweets [rdr]
    (for [line (line-seq rdr)]
        (let [[apikey timestamp entry] (str/split line #"\|" 3)]
            (vec [apikey timestamp (json/read-str entry)]))))

(with-twitter-data read-tweets)

Quick comparison

Idea Python Clojure
Binding label = val (let [label val])
Unpacking val1, val2 = Destructuring Form
Iteration for (for) macro
Functions def (defn) macro
File Open open() (io/reader)
String Split str.split() (str/split)
JSON Parse json.loads() (json/read-str)
Namespaces Modules (ns) macro
Imports import (ns (:require))
Data Structs {} [] (,) {} [] ()

Clojure unique stuff

Idea Python Clojure
Macros N/A Built-in
Immutable Data Structs N/A Built-in
Keywords N/A Built-in
Lambdas / Blocks Crippled Built-in
DSLs With Metaclasses With Macros
Lazy Eval Opt-in (generators) Opt-out (doall)
Code as Data import ast Lisp forms
Platform Interop via C, Cython, etc. via JVM
Concurrency Trad'l, processes STM Impl
Object Orientation Trad'l, class-based Multi dispatch

Other examples

EMR cluster (lemur)

(defcluster pig-cluster
    :master-instance-type "m1.large"
    :slave-instance-type "m1.large"
    :num-instances 2
    :keypair "emr_jobs"
    :enable-debugging? false
    :bootstrap-action.1 [
        "install-pig"
        (s3-libs "/pig/pig-script")
        ["--base-path" (s3-libs "/pig/")
        "--install-pig" "--pig-versions" "latest"]
    ]
    :runtime-jar (s3-libs "/script-runner/script-runner.jar")
)

EMR Pig steps (lemur)

(defstep twitter-count-step
    :args.positional [
        (s3-libs "/pig/pig-script")
        "--base-path" (s3-libs "/pig/")
        "--pig-versions" "latest"
        "--run-pig-script" "--args"
        "-f" "s3://pystorm/url_counts.pig"
    ]
)

(fire! pig-cluster twitter-count-step)

Twitter Click Spout (Storm)

{"twitter-click-spout"
    (shell-spout-spec
        ;; Python Spout implementation:
        ;; - fetches tweets (e.g. from Kafka)
        ;; - emits (urlref, url, ts) tuples
        ["python" "spouts_twitter_click.py"]
        ;; Stream declaration:
        ["urlref" "url" "ts"]
    )
}

Twitter Count Bolt (Storm)

{"twitter-count-bolt"
    (shell-bolt-spec
        ;; Bolt input: Spout and field grouping on urlref
        {"twitter-click-spout" ["urlref"]}
        ;; Python Bolt implementation:
        ;; - maintains a Counter of urlref
        ;; - increments as new clicks arrive
        ["python" "bolts_twitter_count.py"]
        ;; Emits latest click count for each tweet as new Stream
        ["twitter_link" "clicks"]
        :p 4
    )
}

Running local Storm cluster

(defn run-local! []
    (let [cluster (LocalCluster.)]
        ;; submit the topology configured above
        (.submitTopology cluster
                        ;; topology name
                        "test-topology"
                        ;; topology settings
                        {TOPOLOGY-DEBUG true}
                        ;; topology configuration
                        (mk-topology))
        ;; sleep for 5 seconds before...
        (Thread/sleep 5000)
        ;; shutting down the cluster
        (.shutdown cluster)
    )
)

Command line parsing

(defn -main [& args]
    (let [[opts args banner]
        (cli args
            ["-h" "--help" "Show help"
                :flag true :default false]
            ["-v" "--verbose" "Verbose output"
                :flag true :default false]
            ["-s" "--spec" "Storm Topology spec .clj file"]
            ["-j" "--jar" "Storm Topology code .jar file"]
            ["-c" "--config" "Storm Environment config file"
                :default "config.json"]
            ["-d" "--debug" "Enable Storm Topology debugging"
                :default true]
            ["-e" "--env" "Environment, e.g. prod or local"
                :default "local"]
            )]
        (println opts args banner)))

SQL table creation

(ns ring-sample.sqlite
    (:require   [clojure.java.jdbc :as jdbc]))

(def db {:classname   "org.sqlite.JDBC"
         :subprotocol "sqlite"
         :subname     "data/database.db"})

(defn make-table! []
    (try
        (jdbc/with-connection db
            (jdbc/create-table
                    :accounts
                    [:id :integer]
                    [:apikey :text]
                    [:name :text]
                    [:seen :datetime]))
    (catch Exception exc (println exc))))

(make-table!)

lein

lein is like venv, pip, ipython, setuptools, and buildout all combined in one tool.

Through plugins, can also embed build tools that you'd put normally put in a Makefile.

Also includes "project quickstarts", similar to django quickstart and cookiecutter.

lein is written in Clojure, but you also install Clojure itself using lein by declaring a dependency to Clojure in your project.clj file.

project.clj example

;;            project            version
;;               ^                  ^
(defproject parsely-stormtest "0.0.1-SNAPSHOT"
    ;; code locations
    :source-paths ["src/clj"]
    :resource-paths ["multilang", "data"]

    ;; project dependencies
    :dependencies   [[org.apache.storm/storm-core "0.9.1"]
                     [org.clojure/clojure "1.5.1"]
                     [org.clojure/data.json "0.2.4"]
                     [org.clojure/tools.namespace "0.2.4"]]

    ;; invoked by lein run
    :main parsely.stormtest

    ;; lein compile options
    :min-lein-version "2.0.0"
    :aot :all
)

Running

Functionality How
Just the Clojure REPL java -jar clojure.jar
Clojure REPL w/ dependencies lein repl
CLI entry-point w/ dependencies lein run

Using

Functionality How
Eval code in editor nREPL plugins for vim / emacs / SublimeText
Run-debug loop Repeated lein run calls at CLI, or tests
Interact with code LightTable embeds Clojure really nicely
Code notebook session is a project akin to IPython Notebook

Testing

Functionality How
Unit test framework Use clojure.test, then lein test
Test-driven / BDD midje, speclj, expectations
Parametric simulant, test.check
Assertions (is (.startsWith "abcde" "ab"))

Go forth!

Install lein:

Read a little bit more about Clojure: