I am learning clojure and decided to start out by trying to write a solution to a fairly simple algorithm, reservoir sampling. As I stated, I am learning clojure specifically and problem solving in a functional language in general. Can someone please take a look at my code and critique it on it's "clojureness". Am I using the right idiomatic conventions, is there a way that performs better (and why), formatting, anything really.
(defn sample-seq [size data] (loop [sample (transient (vec (take size data))) idx size data (drop size data)] (if (empty? data) (persistent! sample) (let [rand-num (rand-int idx) new-sample (if (< rand-num size) (assoc! sample rand-num (first data)) sample)] (recur new-sample (inc idx) (rest data)))))) (println (sample-seq 4 [2.0, 4.0, 7.0, 6.0, 3.0, 8.0, 12.0, 9.0, 4.0, 1.0]))
I think your code is pretty readable and looks like idiomatic enough Clojure code¹. So from a readability standpoint your code seems fine. However performing
assoc on a vector of length
O(log n) time, so your runtime will be in
O(n log n) as opposed to
O(n), which an imperative implementation would be in.
However there's not much you can do about this other than perhaps using java arrays imperatively, but that would be very unidiomatic Clojure code. And
O(n log n) isn't that bad (definitely not as bad as the
O(n^2) I incorrectly claimed before).
(Note that my previous note about transients was wrong as
assoc! on a transient vector has the same runtime complexity as on a persistent vector).
¹ Usually you'd avoid index-based loops wherever possible, but the nature of the algorithm makes that pretty much impossible here.Tweet