Clojure training with lambda next · Day 3 - olange/learning-clojure GitHub Wiki

Protocols

Interfaces from Java. Single dispatch on type of 1st arg. Open. Can extend existing classes to existing protocols. Wrappers screw up value. Protocols dispatch, not wrap.

Protocols should be internal to libraries. NOT API. Protocols are not for consumers (callers). Somehow a low-level feature. «I would not design around this functionality» says Christophe.

(defprotocol Drinkable
  ;; these functions will be defined in the current namespace
  (drink [x] [x how-fast] "Drink the thing")  ;; multiple signatures
  (refill [x])
)

(extend-protocol
    Drinkable  ;; the protocol we are extending
  String       ;; can we drink a String? :o)
  (drink
  	([x] :this-string-is-not-that-tasty)
    ([x how-fast] :even-worse-at-speed))
  (refill [x] (str x x))

  nil
  (drink
  	([x] nil)
    ([x how-fast] nil))
  (refill [x] []))

(drink "coffee")
(drink "coffee" :in-a-hurry)
(refill "coffee")

Protocols should extend classes only, not other interfaces. If a class extends many protocols, you can also write it from the class point of view (under the hood, extend-protocol and extend-type use both extend):

(extend-type String
  Protocol1
  ...
  Protocol2
  ...)

Other stuff:

(extend-protocol Drinkable
  clojure.lang.APersistentMap
  (refill [x] (assoc x :refilled true))

  Object
  (refill [x] :default-behaviour))

Records

(defrecord Coffee
	[bean with-milk? temperature])

(def latte
	(->Coffee :arabica true 80))

latte
(:bean latte)
(assoc latte :temperature 100)
(update-in latte [:temperature] #(* 2 %))
(assoc latte :froth false)
(dissoc latte :temperature)

(Coffee. :arabica true 80)
(->Coffee :arabica true 50)
(map->Coffee {:bean :arabica :with-milk? true :temperature 50})

Use the ->Coffee constructor rather than Coffee., as the first one will be updated when the definition of the record changes; the second one will remain linked to the previous definition in case of a change.

As records are regular classes, you can extend them:

(extend-type Coffee
    Drinkable  ;; the protocol we are extending
  (drink
    ([x] :this-is-too-cold)
    ([x how-fast] :better-drink-it-fast))
  (refill [x] :you-ve-had-enough-coffee))

(drink latte)
(refill latte)

### Inline extensions

You can also inline protocols being extended when defining a record; it is then going to be baked into the definition:

(defrecord Tea [leaf temperature with-milk?]
  Drinkable
  (drink [x] "tasteful leaves")
  (drink [x y] "drink it slowly")
  (refill [x] "tea is better refilled")

  clojure.lang.IFn   ;; allows to call the record
  (invoke [x y] (get x y))) ;; variadic argument list not allowed in protocol extension

(def morning-tea (->Tea :oolong 60 :never))

(morning-tea :leaf)

(refill morning-tea)

(use 'clojure.pprint)
(pprint (ancestors Tea))
(pprint (ancestors Coffee))

Christophe likes to use records because the efficiently pack data in memory. He's not using them to model business entities. Edmund used them for Java interop.

## Reify

Closures which implement any interface/protocol closing over a lexical scope.

(def my-runnable
  (let [x :some-stuff
  	    out *out*] ;; current value of output stream
  	(reify Runnable
  		(run [_]
        (binding [*out* out]
        	(println x))))))

(-> my-runnable Thread. .start)

About the binding and the earmuffs: in the early times of Clojure, every variable was dynamic and could be rebound (maps for instance were rebound, for debugging purpose or other). Today the default is static, and one should use the ^:dynamic type hint to define a dynamic variable.

Earmuffs around a variable name is a syntax convention to signal that it is dynamic and might be rebound; the compiler complains if one tries to define a static variable with earmuffs:

(def *i-wanna-be-dynamic* 3) => compiler error
(def ^:dynamic *i-wanna-be-dynamic* 3)

### Dynamic scope vs Lexical scope

(def f (fn [] *i-wanna-be-dynamic*))

(f) => error *i-wanna-be-dynamic* not bound

(binding [*i-wanna-be-dynamic* 42]
  (f)) => 42

Kind of evil; beware possible misconceptions:

(def f (binding [*i-wanna-be-dynamic* 3]
  (fn [] *i-wanna-be-dynamic*)))

(binding [*i-wanna-be-dynamic* 42]
  (f)) => 42 not 3

Listing current bindings:

(pprint (take 10 (get-thread-bindings)))
(pprint (take 10 (ns-publics *ns*)))

Types

(deftype Biscuit
  [age jamminess oatiness]
 ) ;; defines an object with nothing more

(def biscuit1 (Biscuit. 0.3 false true))

(:oatiness biscuit1) ;; nil!
(.oatiness biscuit1) ;; alright, field access works

The feature that separates deftype from defrecord is that type fields can be made mutable.

(defprotocol Age
  (get-older [x]))

(deftype Biscuit
  [^:volatile-mutable age jamminess oatiness]
  Age
  (get-older [x] (set! age (inc age))))  ;; setting the Java object instance field 'age'

(def biscuit2 (Biscuit. 1.0 false true))

(get-older biscuit2) => 2.0

You actually have to handle locking:

(deftype Biscuit
  [^:volatile-mutable age jamminess oatiness]
  Age
  (get-older [x]
    (locking x
      (set! age (inc age)))))

Reimplementing delay using deftype

Indicating whether its evaluated by having a fn or not. Unsynchronized mutable:

(deftype Delay
  [^:unsynchronized-mutable f
  ^:unsynchronized-mutable result]

  clojure.lang.IDeref ;; this is Clojure's inbuilt
  (deref [d]
    (locking d
      (when f
        (set! result (f))
        (set! f nil))
      result)))

(defn slow-drink []
  (prn "About to take a drink")
  (Thread/sleep 1000)
  (prn "Drink consumed"))

(def delayed-slow-drink
  (Delay. slow-drink nil))

@delayed-slow-drink

Even better

(deftype Delay
  [^:unsynchronized-mutable f
  ^:unsynchronized-mutable result]

  clojure.lang.IDeref
  (deref [d]
    (when f
      (locking d
        (when f  ;; recheck in case it changed
          (set! result (f))
          (set! f nil))
    result))))

Macros

To avoid the boilerplate in (Delay. slow-drink nil), one can define a macro.

We want to produce this:

(ddelay
  (fn []
    (prn "About to take a drink")
    (Thread/sleep 1000)
    (prn "Drink consumed")))

Given this input:

(mdelay
  (prn "About to take a drink")
  (Thread/sleep 1000)
  (prn "Drink consumed"))

We can write simple macros using back-tick ` and a code template; tilda ~ to insert a named placeholder, or tilda-at ~@ to do the same und unslice a code block (which comes as a sequence of «instructions»):

(defmacro mdelay [& body]
  `(ddelay
    (fn []
      ~@body))) ;; unspliced

(macroexpand 
  '(mdelay
    (prn "About to take a drink")
    (Thread/sleep 1000)
    (prn "Drink consumed")))

Implementing the imply operator (=>)

Our new imply operator is =>. Imply is not(A) or B. Truth table: 1 1 => 1, 0 0 => 1, 1 0 => 0, 0 1 => 1.

Lets start implementing it as a function:

(defn implies [a b]
 (or (not a) b))

(implies false
         (do (prn "hello")
             true)))

As a function's arguments are evaluated at call time in Clojure, there is no way to short-circuit the evaluation using functions. But wait, with a macro, we can have short-circuiting:

(defmacro => [a b]
  `(or (not ~a) ~b))

(macroexpand-1 '(=> false true))
=> (clojure.core/or (clojure.core/not false) true)

... because or itself is short-circuiting:

(source or)
=> (defmacro or
     "Evaluates exprs one at a time, from left to right. If a form
     returns a logical true value, or returns that value and doesn't
     evaluate any of the other expressions, otherwise it returns the
     value of the last expression. (or) returns nil."
     {:added "1.0"}
     ([] nil)
     ([x] x)
     ([x & next]
        `(let [or# ~x]
           (if or# or# (or ~@next)))))

### About templating, code substitution and unsplicing

Without ~, the template inserts the argument symbol as is:

(defmacro small-macro [x]
  `(prn x))

(small-macro 3) => Compiler exception  No such var: user/x

(macroexpand-1 '(small-macro 3))
=> (clojure.core/prn user/x)

Use the tilda ~ to substitute the actual argument value:

(defmacro small-macro [x]
  `(prn ~x))

(small-macro 3) => 3

Substituting a code block with ~ brings in a sequence:

(defmacro small-macro [& xs]
  `(do ~xs))

(small-macro (prn "hello") (prn "bye"))

(macroexpand-1 '(small-macro (prn "hello") (prn "bye")))
=> (do ((prn "hello") (prn "bye")))

Unsplice the code block with ~@ and it'll be fine:

(defmacro small-macro [& xs]
  `(do ~@xs))

(macroexpand-1 '(small-macro (prn "hello") (prn "bye")))
=> (do (prn "hello") (prn "bye"))

It is considered bad practice to define names from within a macro; new symbol names should only be provided by the user. If you however need to store the result of a computation, use a pound # suffix after the name, which will generate a suffix to the symbol name, to avoid accidentally shadowing/capturing an existing symbol name:

(defmacro => [a b]
  `(major (not ~a) ~b))

(defmacro major [a b]
  `(let [inta# ~a]
     (if inta# inta# ~a)))

(=> (do (prn "hello")
        false)
    (do (prn "hello again")
        false))

Anaphoric macros

(if-it (+2 2) (prn it "is true) :false) didn't follow; was playing with the previous stuff in the REPL

Lunch

About the macro stuff: « People come for macros in Clojure and stay for immutability. » and «I'm mostly using the basic stuff of Clojure, and come to macros or protocols for edgy things. » says Edmund.

About performance and profiling: « Get memory usage under control first, then go for CPU performance. Reducers and transients!/persistent! are your friends for efficient memory usage » says Christophe.

Sequences are mostly used to build computational pipelines. But the build up a lot of intermediary values, which are using memory. Reducers avoid that.

## Reducers

In the beginning there were sequences... (afternoon stand-up here)

(->> [1 2 3 4]
     (filter even?)
     (map inc)
     (into #{}))

We are creating a new collection type from nothing than plain functions; its a blackbox where the only way to get to the content is to perform an operation on its content.

A «functional collection» is a function of two arguments: a function and an initial value.

;; fcoll is a fn from f * init -> result

;; fnil is the empty collection

(defn fnil [f init]
  init)

;; we put a new element in front of our functional collection
(defn fcons [fcoll x]
  (fn [f init]
    (fcoll f (f init x))))

(defn freduce [f init fcoll]
  (fcoll f init))

(defn fcons [head tail]
  (fn [f init]
    (freduce f (f init head) tail)))

;; we ask the empty collection to accumulate in a vector
(freduce conj [] fnil) => []

(freduce conj [] (fcons 3 fnil)) => [3]
(freduce conj [] (fcons 4 (fcons 3 fnil))) => [4 3]


(defn fcat [coll1 coll2]
  (fn [f init]
    (freduce f (freduce f init coll1) coll2)))

(defn fmap [mf coll]
  (fn [f init]
    (freduce
     (fn [acc x]
       (f acc (mf x)))
    init coll)))

(fmap inc (fcons 1 (fcons 2 fnil)))
(freduce conj [] *1)


(defn ffilter [pred coll]
  (fn [f init]
    (freduce
      (fn [acc x]
        (if (pred x)
          (f acc x)
          acc))
      init coll)))

(ffilter odd? (fmap inc (fcons 1 (fcons 2 fnil))))
(freduce conj [] *1)

;; there is a pattern here, lets factor out the freduce function constructor

(defn reducer [xf coll]
  (fn [f init]
    (freduce (xf f) init coll)))

(defn fmap [mf coll]
  (reducer
    (fn [f]
      (fn [acc x]
        (f acc (mf x))))))

(defn ffilter [pred coll]
  (reducer
    (fn [f]
      (fn [acc x]
        (if (pred x)
          (f acc x)
          acc)))
    coll))

(defn fcoll [coll]
  (fn [f init]
    (reduce f init coll)))

(defn fmapcat [mf coll]
  (reducer
    (fn [f]
      (fn [acc x]
        (freduce f acc (mf x))))
    coll))

Folding: uses an associative merging function and another one to perform computation

(def v (vec (take 1e7 (cycle [0 1 2 3 4 5 6]))))

(require '[clojure.core.reducers :as r])

(dotimes [_ 5]
  (time (reduce conj #{} v)))

(dotimes [_ 5]
  (time (r/fold 5000
    (fn
      ([] (#{}))
      ([a b] (into a b)))
    conj v)

## Profiling

Basic Game of life implementation, using sequences and idiomatic Clojure

(defn neighbours [x y](/olange/learning-clojure/wiki/x-y)
  (for [dx [-1 0 1]
        dy (if (zero? dx)
              [-1 1]
              [-1 0 1])]
    [(+ x dx) (+ y dy)]))

(defn step [cells]
  (set
    (for [[cell n] (frequencies (mapcat neighbours cells))
      :when (or (= n 3) (and (= n 2) (cells cell)))]
      cell)))

(def blinker #{ [1 0] [0 0] [-1 0]})

(step blinker)
(= (step (step blinker)) blinker)

(def world
  (set (take 500
    (repeatedly (fn [] [(rand-int 50) (rand-int 50)])))))

(dotimes [_ 5]
  (time (nth (iterate step world) 1000)))
=> "Elapsed time: 6783.69 msecs"
"Elapsed time: 6153.174 msecs"
"Elapsed time: 5994.15 msecs"
"Elapsed time: 6011.968 msecs"
"Elapsed time: 5904.513 msecs"

By the way, a trick to debug and peek within the core of an expression is to use (doto (... expr ...) prn):

(doto (frequencies (mapcat neighbours cells))
      prn)

Now Christophe is profiling with Java VisualVM, connected to the JVM that Eclipse CCW started (hint: rather don't try to use the profiler from the Eclipse Foundation, one never gets anything out of it). Run this on command line:

~ $ jvisualvm

Within the Profiler's Memory settings, check Record allocation stack traces. Start profiling memory usage, run the program, take a snapshot. From the columns of the snapshot view, add the Total Alloc. Obj. column (rather than looking at Live Bytes of Live objects).

Looking at the memory usage (Total Alloc. Obj.), one can see that LazySeqs take quite a lot of memory. Right click on the entry and select Show allocation stack traces to further investigate.

We can do something about it by using records instead:

(defrecord Cell [x y])

(defn neighbours [cell]
  (for [dx [-1 0 1]
        dy (if (zero? dx)
              [-1 1]
              [-1 0 1])]
    (Cell. (+ (:x cell) dx) (+ (:y cell) dy))))

(defn step [cells]
  (set
    (for [[cell n] (frequencies (mapcat neighbours cells))
      :when (or (= n 3) (and (= n 2) (cells cell)))]
      cell)))

(def world
  (set (take 500
    (repeatedly (fn [] (Cell. (rand-int 50) (rand-int 50)))))))

(dotimes [_ 5]
  (time (nth (iterate step world) 1000)))

Now we see memory usage has dropped significantly.

The next thing using memory is the record equality comparison and hash code calculation. Christophe goes on using deftype and reimplementing equality comparison with Java interop.

Fun: the first 128 long are allocated at startup, cached and always reused by the JVM:

(count (take-while true? (map identical? (range) (range))))
=> 128

Beware the JVM settings when profiling

The JVM started from Eclipse CCW has correct settings to do such profiling.

Beware when profiling the JVM from Leiningen REPL: it gives less meaningful results per default, because Leiningen disables features/changes settings of the JVM, in order to start faster. One has to define the JVM settings correctly in project.clj before trying to profile in the REPL (with Criterium for instance).

### In general

Try to optimize locally first, using reducers, before changing your code to get more performance.

Get memory usage under control first, then go for CPU performance. Memory usage pressures the GC, which will use CPU.

Reducers and transients!/persistent! are your friends for efficient memory usage. Sequences build up a lot of intermediary values, which are using memory. Reducers avoid that.