Great Answers to
Questions About Everything


I am learning clojure and decided to start out by trying to write a solution to a fairly simple algorithm, reservoir sampling. As I stated, I am learning clojure specifically and problem solving in a functional language in general. Can someone please take a look at my code and critique it on it's "clojureness". Am I using the right idiomatic conventions, is there a way that performs better (and why), formatting, anything really.

(defn sample-seq [size data]
  (loop [sample (transient (vec (take size data)))
         idx size
         data (drop size data)]
    (if (empty? data)
      (persistent! sample)
      (let [rand-num (rand-int idx)
            new-sample (if (< rand-num size) 
                         (assoc! sample rand-num (first data)) 
        (recur new-sample (inc idx) (rest data))))))
(println (sample-seq 4 [2.0, 4.0, 7.0, 6.0, 3.0, 8.0, 12.0, 9.0, 4.0, 1.0]))

{ asked by MGoDave }


I think your code is pretty readable and looks like idiomatic enough Clojure code¹. So from a readability standpoint your code seems fine. However performing assoc on a vector of length n takes O(log n) time, so your runtime will be in O(n log n) as opposed to O(n), which an imperative implementation would be in.

However there's not much you can do about this other than perhaps using java arrays imperatively, but that would be very unidiomatic Clojure code. And O(n log n) isn't that bad (definitely not as bad as the O(n^2) I incorrectly claimed before).

(Note that my previous note about transients was wrong as assoc! on a transient vector has the same runtime complexity as on a persistent vector).

¹ Usually you'd avoid index-based loops wherever possible, but the nature of the algorithm makes that pretty much impossible here.

{ answered by sepp2k }