[erlang-questions] list:join() for erlang?

Fri Sep 14 02:30:10 CEST 2007

I wrote:
>> join0(X, [Y|Ys]) -> X ++ join0(Y, Ys);
>> join0(X, [])     -> X.
>>
[join/2, join1/1, join2/3 deleted]

On 13 Sep 2007, at 10:34 pm, Ben Munat wrote:

> Out of curiosity -- still feeling my way around erlang and functional
> programming -- aren't the "helper" functions here stack-recursive  
> (i.e. *not* tail-recursive)?

I've never heard the term "stack-recursive" before; the normal term
for calls that are not tail calls is "body" calls (in this case,
body recursion).  Yes, they are body calls.  What of that?  They are
what they need to be.  As a matter of fact, the Prolog equivalents
of these *would* be tail recursive:

     join0([], X, X).
     join0([Y|Ys], X, Ans) :-
         append(X, R, Ans),
         join0(Ys, Y, R).

It would be possible to implement Erlang in such a way that these
Erlang functions were also tail recursive.  It wouldn't even be hard.
In effect, each function would receive a hidden parameter that was
the address of where to put the result.  This would require a
different kind of garbage collector, though; the current one can and
does exploit the fact that there cannot be pointers from "older"
cells to "newer" ones.

Tail recursion is nice when you can use it, but in the current
version of Erlang it would be necessary to build up a result in
reverse and then reverse it at the end.  There are times when that's
a good idea, but this isn't one of them.  Another idea would be not
to build up the result but to build a stack of lists to be appended.
Let's see what happens.

join([], _) -> [];
join(Xs, [])  -> [Last|Stack] = reverse(Xs), join0(Stack, Last);
join(Xs, [C]) -> [Last|Stack] = reverse(Xs), join1(Stack, Last, C);
join(Xs, Sep) -> [Last|Stack] = reverse(Xs), join2(Stack, Last, Sep).

join0([Ys|Yss], Xs) -> join0(Yss, Ys++Xs);
join0([],       Xs) -> Xs.

join1([Ys|Yss], Xs, C) -> join1(Yss, Ys++[C|Xs], C);
join1([],      Xs, _) -> Xs.

join2([Ys|Yss], Xs, Sep) -> join2(Yss, Ys++Sep++Xs, Sep);
join2([],      Xs, _  ) -> Xs.

In effect, this also uses a stack, but it builds the stack as a list
in the heap.  Rather to my surprise, this second version turned out
to be faster than the previous version.  Mind you, you have to make
the inputs so large, and repeat the operation so many times, to get
a measurable time, that it's not really a problem.

I wondered whether unrolling the loops would be worth while, but
the measurements fluctuated so much it was hard to tell.  What
I'd really like is a high resolution cpu time BIF, which is the
subject of another thread...