[erlang-questions] join/2 function
Ulf Wiger (TN/EAB)
ulf.wiger@REDACTED
Mon Dec 18 10:50:07 CET 2006
Actually, if you increase the size of the data set,
you will get slightly different results (t3 is still
the cheapest, though):
1> L = lists:duplicate(50,lists:duplicate(20,$a)).
["aaaaaaaaaaaaaaaaaaaa",
"aaaaaaaaaaaaaaaaaaaa"|...]
2> timer:tc(implode,t1,[L1,"."]).
{832,
"aaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaa.|..."}
3> timer:tc(implode,t2,[L1,"."]).
{185,
"aaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaa.|..."}
4> timer:tc(implode,t3,[L1,"."]).
{49,
"aaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaa.|..."}
5> timer:tc(implode,t4,[L1,"."]).
{63,
"aaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaa.|..."}
where
t4([H|T], Sep) ->
H ++ lists:concat([Sep ++ X || X <- T]).
t3 is consistently better than t4(*). The reason
why both of them are better than the others is
that they reduce the number of calls to ++
(lists:concat() uses ++, but flatten() doesn't)
(*) Except if L is a short list of very long substrings,
e.g. lists:duplicate(3,lists:duplicate(1000,$a)).
The nice thing about t3 is obviously that it always
puts the (growing) accumulator last in the append.
The left hand side of ++ is always copied, which is
why t1 is so bad. The t2 version using flatten()
gets badly beaten for two reasons: (1) flatten
always rebuilds the whole list, while ++ doesn't
have to rebuild the RHS, and (2) it is implemented
purely in erlang, while ++ is a BIF.
BUT the big problem with running tests this way is
that we don't get a fair picture of the GC component.
Let's run a number of iterations for each:
do1(N, L,S) when N > 0 ->
t1(L,S),
do1(N-1,L,S);
do1(_,_,_) ->
erlang:garbage_collect(),
ok.
....
do4(N, L,S) when N > 0 ->
t4(L,S),
do4(N-1,L,S);
do4(_,_,_) ->
erlang:garbage_collect(),
ok.
Using L = ["abc","def","ghi","jkl","mno","pqr","stu
","vxyz","123456789","987654321"]
do1(1000,L,"."): 7428 us
do2(1000,L,"."): 12519 us
do3(1000,L,"."): 5370 us
do4(1000,L,"."): 7388 us
Using L2 = lists:duplicate(50,lists:duplicate(50,$a)).
do1(1000,L2,"."): 1353298 us
do2(1000,L2,"."): 417940 us
do3(1000,L2,"."): 121037 us
do4(1000,L2,"."): 72817 us
Here, t2 makes a comeback since it at least
avoids appending in the inner loop.
Conclusion: benchmarking is tricky. (:
BR,
Ulf W
> -----Original Message-----
> From: erlang-questions-bounces@REDACTED
> [mailto:erlang-questions-bounces@REDACTED] On Behalf Of
> Torbjorn Tornkvist
> Sent: den 18 december 2006 09:34
> To: erlang-questions@REDACTED
> Subject: Re: [erlang-questions] join/2 function
>
>
> This was discussed recently on the #erlang channel.
> My take on this was:
>
>
> t3(List, Sep) ->
> lists:foldr(fun(X,[]) -> X; (X,Acc) -> X++Sep++Acc end,
> "", List).
>
>
> See the attached file for all three versions.
>
> 3>
> timer:tc(implode,t1,[["abc","def","ghi","jkl","mno","pqr","stu
> ","vxyz","123456789","987654321"],
> "."]).
> {7,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
> 4>
> timer:tc(implode,t1,[["abc","def","ghi","jkl","mno","pqr","stu
> ","vxyz","123456789","987654321"],
> "."]).
> {7,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
> 5>
> timer:tc(implode,t2,[["abc","def","ghi","jkl","mno","pqr","stu
> ","vxyz","123456789","987654321"],
> "."]).
> {10,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
> 6>
> timer:tc(implode,t2,[["abc","def","ghi","jkl","mno","pqr","stu
> ","vxyz","123456789","987654321"],
> "."]).
> {10,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
> 7>
> timer:tc(implode,t3,[["abc","def","ghi","jkl","mno","pqr","stu
> ","vxyz","123456789","987654321"],
> "."]).
> {5,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
> 8>
> timer:tc(implode,t3,[["abc","def","ghi","jkl","mno","pqr","stu
> ","vxyz","123456789","987654321"],
> "."]).
> {5,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
>
>
> --Tobbe
>
>
> Matthias Lang wrote:
> > Hoan Ton-That writes:
> >
> > > I think that the following function join should
> > > be included in the lists module.
> > >
> > > join(Sep, List) ->
> > > lists:foldl(fun(A, "") -> A; (A, Acc) -> Acc ++ Sep ++
> A end, "", List).
> >
> > Your implementation of join is quadratic in the length of the list.
> >
> > Here's one O(N) way to do it:
> >
> > lcjoin(Sep, [H|T]) ->
> > Joined = [ [Sep|X] || X <- T ],
> > lists:flatten([H|Joined]).
> >
> > In many cases the lists:flatten/1 call is unnecessary.
> >
> > Matthias
>
>
More information about the erlang-questions
mailing list