Tail/Non tail recursion

Fri Aug 29 14:07:50 CEST 2003

Here are new benchmark results including Richard A. O'Keefe's unrolled 
version and an unrolled variant that delivers a reversed result.

		R7B01	R8B	R9B	R9C
--------------------------------------------
bm_square_ru	1.29 	0.96 	0.96 	1.00
bm_square_u	1.27 	1.06 	1.10 	1.02
bm_square_d	1.27 	1.09 	1.11 	1.08
bm_square_l	1.49 	1.09 	1.08 	1.08
bm_square_t	1.37 	1.14 	1.12 	1.14
bm_square_m	1.82 	1.59 	1.66 	1.58

Winner: reversed and unrolled. It returns the result list in reverse 
order, but it is tail recursive and the fastest. Often order is not 
important so do not reverse lists just for the fun of it.

Note that the difference between all but the last really is neglectible. 
The last is the lists:map/2 version that suffers from being general 
since it calls a fun in the inner loop which slows things down.

So the best(tm) choice is really a matter of taste, code clarity, etc.

/ Raimo Niskanen, Eralng/OTP, Ericsson AB

Code:
==================================================================
-module(square_bm).
-include("bench.hrl").
-export([benchmarks/0]).
-export([bm_square_d/1,bm_square_u/1,bm_square_t/1,bm_square_ru/1,
	 bm_square_l/1,bm_square_m/1]).

benchmarks() ->
     {100, [bm_square_d,bm_square_u,bm_square_t,bm_square_ru,
	   bm_square_l,bm_square_m]}.

bm_square_d(Iter) ->
     iterate(Iter, fun square_d/1).

bm_square_u(Iter) ->
     iterate(Iter, fun square_u/1).

bm_square_t(Iter) ->
     iterate(Iter, fun square_t/1).

bm_square_ru(Iter) ->
     iterate(Iter, fun square_ru/1).

bm_square_l(Iter) ->
     iterate(Iter, fun square_l/1).

bm_square_m(Iter) ->
     iterate(Iter, fun square_m/1).

iterate(Iter, Fun) ->
     List = lists:seq(1, 30000),
     iterate(Iter, Fun, List).

iterate(0, _Fun, _List) ->
     ok;
iterate(Iter, Fun, List) ->
     Fun(List),
     iterate(Iter-1, Fun, List).

square_d([]) -> [];
square_d([H|T]) -> [H*H|square_d(T)].

square_u([H,I,J|T]) -> [H*H,I*I,J*J|square_u(T)];
square_u([H,I|T]) -> [H*H,I*I|square_u(T)];
square_u([H|T]) -> [H*H|square_u(T)];
square_u([]) -> [].

square_t(L) -> square_t(L, []).
%%
square_t([], R) -> lists:reverse(R);
square_t([H|T], R) -> square_t(T, [H*H|R]).

square_ru(L) -> square_ru(L, []).
%%
square_ru([H,I,J|T], R) -> square_ru(T, [H*H,I*I,J*J|R]);
square_ru([H,I|T], R) -> square_ru(T, [H*H,I*I|R]);
square_ru([H|T], R) -> square_ru(T, [H*H|R]);
square_ru([], R) -> R.

square_l(L) -> [X*X || X <- L].

square_m(L) ->
     lists:map(fun(X) -> X*X end, L).
==================================================================

Richard A. O'Keefe wrote:
> Jilani Khaldi wrote:
> 	> Is there a better way to write this code
> ====>	> when there is a lot of elements in the list?
> 	
> 	> square([H|T]) -> [H*H | square(T)];
> 	> square([]) -> [].
> 
> Bengt Kleberg <Bengt.Kleberg@REDACTED> suggested, more or less,
> 
> 	square(Numbers) -> lists:map(fun(X) -> X*X end, Numbers).
> 
> Using a higher order function so that you don't have to code the
> recursion yourself is "better" in several ways, but I take it that
> the key phrase was "when there is a lot of elements in the list".
> 
> The answer is to _measure_ it.
> 
> 3> sq:t(d, 100000).
> {650,100000}
> 5> sq:t(m, 100000).
> {2990,100000}
> 
> That is, the first (direct) version was about 4.6 times faster,
> and thus "better" for large numbers of elements.
> 
> Here's a transcript of Erlang running my test file.
> f% erl
> 1> c(sq).
> {ok,sq}
> 2> l(sq).
> {module,sq}
> 3> sq:t(d, 100000).
> {650,100000}
> 4> sq:t(u, 100000).
> {430,100000}
> 5> sq:t(m, 100000).
> {2990,100000}
> 6> {2990/650, 2990/430, 650/430}.
> {4.60000,6.95349,1.51163}
> 7> halt().
> 
> We see that the lists:map version is 4.6 times slower than the
> direct version, which is in term 1.5 times slower than an unrolled version.
> 
> On a different processor, or with a different version of Erlang, your
> results might well be different, but the ratios are likely to be roughly
> the same.  If the speed of such an operation is of the utmost importance,
> then it might be worth going with an unrolled version, but the simple
> direct version should normally be fast enough.
> 
> Here is the sq module.  It's not intended as an example of good Erlang
> style, it's just what I typed in a hurry to do the job.
> 
> -module(sq).
> -export([t/2]).
> 
> t(V, N) ->
>     m(V, lists:duplicate(N, 28.0)).
> 
> m(V, L) ->
>     {T0,_} = statistics(runtime),
>     R = r(V, L),
>     {T1,_} = statistics(runtime),
>     {T1-T0, length(R)}.
> 
> r(d, L) -> d(L);
> r(u, L) -> u(L);
> r(m, L) -> lists:map(fun(X) -> X*X end, L).
> 
> d([H|T]) -> [H*H | d(T)];
> d([])    -> [].
> 
> u([H1,H2,H3|T]) -> [H1*H1, H2*H2, H3*H3 | u(T)];
> u([H1|T])       -> [H1*H1 | u(T)];
> u([])           -> [].
>