[erlang-questions] join/2 function

Mon Dec 18 10:50:07 CET 2006

Actually, if you increase the size of the data set, 
you will get slightly different results (t3 is still
the cheapest, though):

1> L = lists:duplicate(50,lists:duplicate(20,$a)).
["aaaaaaaaaaaaaaaaaaaa",  
 "aaaaaaaaaaaaaaaaaaaa"|...] 
2> timer:tc(implode,t1,[L1,"."]).
{832,
 "aaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaa.|..."}
3> timer:tc(implode,t2,[L1,"."]).
{185,
 "aaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaa.|..."}
4> timer:tc(implode,t3,[L1,"."]).
{49,
 "aaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaa.|..."}
5> timer:tc(implode,t4,[L1,"."]).
{63,
 "aaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaa.|..."}

where

t4([H|T], Sep) ->
    H ++ lists:concat([Sep ++ X || X <- T]).

t3 is consistently better than t4(*). The reason
why both of them are better than the others is
that they reduce the number of calls to ++
(lists:concat() uses ++, but flatten() doesn't)

(*) Except if L is a short list of very long substrings,
e.g. lists:duplicate(3,lists:duplicate(1000,$a)).

The nice thing about t3 is obviously that it always
puts the (growing) accumulator last in the append.
The left hand side of ++ is always copied, which is
why t1 is so bad. The t2 version using flatten()
gets badly beaten for two reasons: (1) flatten
always rebuilds the whole list, while ++ doesn't
have to rebuild the RHS, and (2) it is implemented
purely in erlang, while ++ is a BIF.

BUT the big problem with running tests this way is
that we don't get a fair picture of the GC component.
Let's run a number of iterations for each:

do1(N, L,S) when N > 0 ->
    t1(L,S),
    do1(N-1,L,S);
do1(_,_,_) ->
    erlang:garbage_collect(),
    ok.
....

do4(N, L,S) when N > 0 ->
    t4(L,S),
    do4(N-1,L,S);
do4(_,_,_) ->
    erlang:garbage_collect(),
    ok.

Using L = ["abc","def","ghi","jkl","mno","pqr","stu
           ","vxyz","123456789","987654321"]

do1(1000,L,"."):   7428 us
do2(1000,L,"."):  12519 us
do3(1000,L,"."):   5370 us
do4(1000,L,"."):   7388 us

Using L2 = lists:duplicate(50,lists:duplicate(50,$a)). 

do1(1000,L2,"."): 1353298 us
do2(1000,L2,"."):  417940 us
do3(1000,L2,"."):  121037 us
do4(1000,L2,"."):   72817 us

Here, t2 makes a comeback since it at least 
avoids appending in the inner loop.

Conclusion: benchmarking is tricky. (:

BR,
Ulf W

> -----Original Message-----
> From: erlang-questions-bounces@REDACTED 
> [mailto:erlang-questions-bounces@REDACTED] On Behalf Of 
> Torbjorn Tornkvist
> Sent: den 18 december 2006 09:34
> To: erlang-questions@REDACTED
> Subject: Re: [erlang-questions] join/2 function
> 
> 
> This was discussed recently on the #erlang channel.
> My take on this was:
> 
> 
> t3(List, Sep) ->
>      lists:foldr(fun(X,[]) -> X; (X,Acc) -> X++Sep++Acc end, 
> "", List).
> 
> 
> See the attached file for all three versions.
> 
> 3> 
> timer:tc(implode,t1,[["abc","def","ghi","jkl","mno","pqr","stu
> ","vxyz","123456789","987654321"],
> "."]).
> {7,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
> 4> 
> timer:tc(implode,t1,[["abc","def","ghi","jkl","mno","pqr","stu
> ","vxyz","123456789","987654321"],
> "."]).
> {7,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
> 5> 
> timer:tc(implode,t2,[["abc","def","ghi","jkl","mno","pqr","stu
> ","vxyz","123456789","987654321"],
> "."]).
> {10,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
> 6> 
> timer:tc(implode,t2,[["abc","def","ghi","jkl","mno","pqr","stu
> ","vxyz","123456789","987654321"],
> "."]).
> {10,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
> 7> 
> timer:tc(implode,t3,[["abc","def","ghi","jkl","mno","pqr","stu
> ","vxyz","123456789","987654321"],
> "."]).
> {5,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
> 8> 
> timer:tc(implode,t3,[["abc","def","ghi","jkl","mno","pqr","stu
> ","vxyz","123456789","987654321"],
> "."]).
> {5,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
> 
> 
> --Tobbe
> 
> 
> Matthias Lang wrote:
> > Hoan Ton-That writes:
> > 
> >  > I think that the following function join should
> >  > be included in the lists module.
> >  > 
> >  > join(Sep, List) ->
> >  >   lists:foldl(fun(A, "") -> A; (A, Acc) -> Acc ++ Sep ++ 
> A end, "", List).
> > 
> > Your implementation of join is quadratic in the length of the list.
> > 
> > Here's one O(N) way to do it:
> > 
> >   lcjoin(Sep, [H|T]) ->
> >       Joined = [ [Sep|X] || X <- T ],
> >       lists:flatten([H|Joined]).
> > 
> > In many cases the lists:flatten/1 call is unnecessary.
> > 
> > Matthias
> 
>