[erlang-questions] Please review this basic Erlang doc?

Fri Feb 2 05:54:16 CET 2007

"Rodrigo Ahumada M." <rodahummont@REDACTED> wrote:
	an example, a simple problem:

	i got a list of vectors and i want a function that returns the indexes
	of the two closest vectors, and their distance.

You say "vectors", but people used to other functional languages are
likely to read "vector" as "tuple".

Why do you want INDICES of the two closest vectors, rather than
the vectors themselves?

	vector_dist(V1, V2) ->
	  math:sqrt(lists:sum(lists:zipwith(fun(A, B) ->=20
						C =3D A - B,
						C * C
					    end, V1, V2))).

In Haskell,

    vector_distance v1 v2 = sqrt (sum [(x-y)^2 | (x,y) <- zip v1 v2])

is fine.  In Erlang,

    vector_dist(V1, V2) ->
        math:sqrt(vector_dist_squared(V1, V2, 0.0)).

    vector_dist_squared([X|Xs], [Y|Ys], Sum) ->
	vector_dist_squared(Xs, Ys, Sum + (X-Y)*(X-Y));
    vector_dist_squared([], [], Sum) -> Sum.

might actually be clearer.  As it turns out, we have a use for
vector_dist_squared.  What use is that?  Simple:  if we want to compare
the distances, we can compare the squared distances.  We only have to
call sqrt once in the entire calculation.

The next step may be the interesting one.
The point of higher order functions is to separate out "control" patterns
from specific calculations.

The control pattern we have here is a kind of "fold":
                                          1   I     J   N
    Traverse the pairs (Xi,I,Xj,J} where [...Xi....Xj....]

    applying function Mapping to each
    and combining the results of that with Folder.

In fact we always use Mapping and Folding together, so we only
need Combiner(Previous, Xi, I, Xj, J) = Folder(Previous, Mapping(Xi,I,Xj,J)).

    triangle_map_and_fold(List, Initial, Combiner) ->
	triangle_map_and_fold_outer(List, 0, Combiner, Initial).

    triangle_map_and_fold_outer([X|Xs], N0, Combiner, R0) ->
        N1 = N0 + 1,
        triangle_map_and_fold_outer(Xs, N1, Combiner,
	    triangle_map_and_fold_inner(Xs, X, N1, N1, Combiner, R0));
    triangle_map_and_fold_outer([], _, _, R) -> R.

    triangle_map_and_fold_inner([Y|Ys], X, N1, M0, Combiner, R0) ->
	M1 = M0 + 1,
	triangle_map_and_fold_inner(Ys, X, N1, M1, Combiner,
	    Combiner(R0, X, N1, Y, M1));
    triangle_map_and_fold_inner([], _, _, _, _, R) -> R.

Compare this with the C99 equivalent:

    T triangle_map_and_fold(
        S const *a, int n, T initial,
        T (*combiner)(T, S, int, S, int)
    ) {        
	for (int i = 0; i < n; i++) {
	    S x = a[i];
	    for (int j = i+1; j < n; j++) {
	        S y = a[j];
	        initial = combiner(initial, x, i, y, j);
	    }
	}
	return initial;
    }

In C99, we have two loops.  In Erlang, two recursive functions.  Same thing.

What do we want to do with each (X,N,Y,M) pair?  We want to compute the
(squared) distance between X and Y and retain the triple {N,M,D}.
What do we want to do with these triples?  Remember the one with the
smallest D.

    closest_pair(Vectors) ->
        {Nbest, Mbest, Dsquared} =
            triangle_map_and_fold(Vectors, start,
                 fun (Previous, X, N, Y, M) ->
                     D = vector_dist_squared(X, Y, 0.0),
                     case Previous
                       of {_,_,D1} when D1 < D -> Previous
                        ; _                    -> {N,M,D}
                     end
                 end),
	{Nbest, Mbest, math:sqrt(Dsquared)}.

If there are V vectors, this does about V**2/2 distance calculations
and builds at most about V**2/2 3-tuples.  We can eliminate the higher-order
bit by unfolding (making a copy of the function and replacing arguments
by their values), and in so doing we could eliminate all of those tuples
except the final result, but I wouldn't bother.

Why not?  Because if I had to find the closest pair of vectors in a large
collection of vectors I'd be reaching for one of Hanan Sammet's books on
spatial data structures to find a cheaper data structure.