# [erlang-questions] join/2 function

Ulf Wiger (TN/EAB) <>
Mon Dec 18 10:50:07 CET 2006

```Actually, if you increase the size of the data set,
you will get slightly different results (t3 is still
the cheapest, though):

1> L = lists:duplicate(50,lists:duplicate(20,\$a)).
["aaaaaaaaaaaaaaaaaaaa",
"aaaaaaaaaaaaaaaaaaaa"|...]
2> timer:tc(implode,t1,[L1,"."]).
{832,
"aaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaa.|..."}
3> timer:tc(implode,t2,[L1,"."]).
{185,
"aaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaa.|..."}
4> timer:tc(implode,t3,[L1,"."]).
{49,
"aaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaa.|..."}
5> timer:tc(implode,t4,[L1,"."]).
{63,
"aaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaa.|..."}

where

t4([H|T], Sep) ->
H ++ lists:concat([Sep ++ X || X <- T]).

t3 is consistently better than t4(*). The reason
why both of them are better than the others is
that they reduce the number of calls to ++
(lists:concat() uses ++, but flatten() doesn't)

(*) Except if L is a short list of very long substrings,
e.g. lists:duplicate(3,lists:duplicate(1000,\$a)).

The nice thing about t3 is obviously that it always
puts the (growing) accumulator last in the append.
The left hand side of ++ is always copied, which is
why t1 is so bad. The t2 version using flatten()
gets badly beaten for two reasons: (1) flatten
always rebuilds the whole list, while ++ doesn't
have to rebuild the RHS, and (2) it is implemented
purely in erlang, while ++ is a BIF.

BUT the big problem with running tests this way is
that we don't get a fair picture of the GC component.
Let's run a number of iterations for each:

do1(N, L,S) when N > 0 ->
t1(L,S),
do1(N-1,L,S);
do1(_,_,_) ->
erlang:garbage_collect(),
ok.
....

do4(N, L,S) when N > 0 ->
t4(L,S),
do4(N-1,L,S);
do4(_,_,_) ->
erlang:garbage_collect(),
ok.

Using L = ["abc","def","ghi","jkl","mno","pqr","stu
","vxyz","123456789","987654321"]

do1(1000,L,"."):   7428 us
do2(1000,L,"."):  12519 us
do3(1000,L,"."):   5370 us
do4(1000,L,"."):   7388 us

Using L2 = lists:duplicate(50,lists:duplicate(50,\$a)).

do1(1000,L2,"."): 1353298 us
do2(1000,L2,"."):  417940 us
do3(1000,L2,"."):  121037 us
do4(1000,L2,"."):   72817 us

Here, t2 makes a comeback since it at least
avoids appending in the inner loop.

Conclusion: benchmarking is tricky. (:

BR,
Ulf W

> -----Original Message-----
> From:
> [mailto:] On Behalf Of
> Torbjorn Tornkvist
> Sent: den 18 december 2006 09:34
> To:
> Subject: Re: [erlang-questions] join/2 function
>
>
> This was discussed recently on the #erlang channel.
> My take on this was:
>
>
> t3(List, Sep) ->
>      lists:foldr(fun(X,[]) -> X; (X,Acc) -> X++Sep++Acc end,
> "", List).
>
>
> See the attached file for all three versions.
>
> 3>
> timer:tc(implode,t1,[["abc","def","ghi","jkl","mno","pqr","stu
> ","vxyz","123456789","987654321"],
> "."]).
> {7,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
> 4>
> timer:tc(implode,t1,[["abc","def","ghi","jkl","mno","pqr","stu
> ","vxyz","123456789","987654321"],
> "."]).
> {7,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
> 5>
> timer:tc(implode,t2,[["abc","def","ghi","jkl","mno","pqr","stu
> ","vxyz","123456789","987654321"],
> "."]).
> {10,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
> 6>
> timer:tc(implode,t2,[["abc","def","ghi","jkl","mno","pqr","stu
> ","vxyz","123456789","987654321"],
> "."]).
> {10,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
> 7>
> timer:tc(implode,t3,[["abc","def","ghi","jkl","mno","pqr","stu
> ","vxyz","123456789","987654321"],
> "."]).
> {5,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
> 8>
> timer:tc(implode,t3,[["abc","def","ghi","jkl","mno","pqr","stu
> ","vxyz","123456789","987654321"],
> "."]).
> {5,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
>
>
> --Tobbe
>
>
> Matthias Lang wrote:
> > Hoan Ton-That writes:
> >
> >  > I think that the following function join should
> >  > be included in the lists module.
> >  >
> >  > join(Sep, List) ->
> >  >   lists:foldl(fun(A, "") -> A; (A, Acc) -> Acc ++ Sep ++
> A end, "", List).
> >
> > Your implementation of join is quadratic in the length of the list.
> >
> > Here's one O(N) way to do it:
> >
> >   lcjoin(Sep, [H|T]) ->
> >       Joined = [ [Sep|X] || X <- T ],
> >       lists:flatten([H|Joined]).
> >
> > In many cases the lists:flatten/1 call is unnecessary.
> >
> > Matthias
>
>

```

More information about the erlang-questions mailing list