#### QUESTION

It is a very common problem that given a distance function $d(p_1,p_2)$ and a set of points `pts`

, we need to construct a matrix `mat`

so that `mat[[i,j]] == d[ pts[[i]], pts[[j]] ]`

.

What is the most efficient way to do this in Mathematica?

Let's assume that the points are in $\mathbb{R}^n$ for simplicity, and because that's the case I'm dealing with now, but theoretically the points could be any type of object, e.g. strings with $d$ being an edit distance.

For the specific problem I have right now I need to calculate the `EuclideanDistance`

and `ManhattanDistance`

of 2D points.

The simplest way to do this is

```
pts = RandomVariate[NormalDistribution[], {1000, 2}];
mat = Outer[ManhattanDistance, pts, pts, 1]; // AbsoluteTiming
(* ==> {0.595327, Null} *)
```

This obviously calculates all distances twice, which is wasteful. So I was hoping for an easy $2\times$ speedup, but it isn't as easy as one would hope. Doing the same operation the same number of times in a `Do`

loop takes considerably longer (probably because of indexing):

```
Do[ManhattanDistance[pts[[10]], pts[[20]]], {Length[pts]^2}]; // AbsoluteTiming
(* ==> {1.902417, Null} *)
```

So what programming pattern do you typically use when calculating such a distance matrix and which one would you recommend for this specific problem?

#### ANSWER

Using `Outer`

is here one of the worst methods, and not just because it computes the distance twice, but because you can't leverage vectorization in this approach. This is actually a common issue and an important point to stress: * Outer works pairwise and is unable to utilize the possible vectorized nature of the operation it is performing on an element-by-element basis*.

Here is the code I will adopt from this answer:

```
distances=
With[{tr=Transpose[pts]},
Function[point,Sqrt[Total[(point-tr)^2]]]/@pts
];//Timing
(* {0.046875,Null} *)
```

which is an order of magnitude faster. You can `Compile`

it with a C target which may improve the performance further. Also, essentially the same approach I used in this recent answer, with good performance.

For Manhattan distance, use

```
distances =
With[{tr = Transpose[pts]},
Function[point, Total@Abs[(point - tr)]] /@ pts];
```

**EDIT**

As noted by Ray Koopman in comments, the function `DistanceMatrix`

from the package `HierarchicalClustering``

may be faster for Euclidean distance, for small and medium data size (up to a couple of thousands):

```
<< HierarchicalClustering`
Sqrt[DistanceMatrix[pts]];// AbsoluteTiming
(* {0.019351, Null} *)
```

Note, however, that this is only true for the particular case of Euclidean distance, or perhaps other distances which don't require to set the `DistanceFunction`

option explicitly on the top-level. In other cases (for example, for Manhattan distance), it will be quite slow, because when `DistanceFunction`

is set explicitly, one can not leverage vectorization any more, once again.