[erlang-questions] join/2 function

Mon Dec 18 16:04:33 CET 2006

Here's another version 't5' that concatinates elements using stack 
instead of an accumulator:

1> implode:t5(implode:l1(), ",").
"abc,def,ghi,jkl,mno,pqr,stu,vxyz,123456789,987654321"
2> implode:test(1000, implode:l1()).
do1(1000,L,"."):    7176us
do2(1000,L,"."):    7376us
do3(1000,L,"."):    3225us
do4(1000,L,"."):    4608us
do5(1000,L,"."):    2620us
ok
3> implode:test(1000, implode:l2()).
do1(1000,L,"."):  447673us
do2(1000,L,"."):  185972us
do3(1000,L,"."):   61102us
do4(1000,L,"."):   46755us
do5(1000,L,"."):   39452us
ok
4> implode:test(10000, implode:l2()).
do1(10000,L,"."): 4840698us
do2(10000,L,"."): 1861653us
do3(10000,L,"."):  613191us
do4(10000,L,"."):  472865us
do5(10000,L,"."):  393750us

Regards,

Serge

Ulf Wiger (TN/EAB) wrote:
> Actually, if you increase the size of the data set, 
> you will get slightly different results (t3 is still
> the cheapest, though):
> 
> 1> L = lists:duplicate(50,lists:duplicate(20,$a)).
> ["aaaaaaaaaaaaaaaaaaaa",  
>  "aaaaaaaaaaaaaaaaaaaa"|...] 
> 2> timer:tc(implode,t1,[L1,"."]).
> {832,
>  "aaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaa.|..."}
> 3> timer:tc(implode,t2,[L1,"."]).
> {185,
>  "aaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaa.|..."}
> 4> timer:tc(implode,t3,[L1,"."]).
> {49,
>  "aaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaa.|..."}
> 5> timer:tc(implode,t4,[L1,"."]).
> {63,
>  "aaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaa.|..."}
> 
> where
> 
> t4([H|T], Sep) ->
>     H ++ lists:concat([Sep ++ X || X <- T]).
> 
> t3 is consistently better than t4(*). The reason
> why both of them are better than the others is
> that they reduce the number of calls to ++
> (lists:concat() uses ++, but flatten() doesn't)
> 
> (*) Except if L is a short list of very long substrings,
> e.g. lists:duplicate(3,lists:duplicate(1000,$a)).
> 
> The nice thing about t3 is obviously that it always
> puts the (growing) accumulator last in the append.
> The left hand side of ++ is always copied, which is
> why t1 is so bad. The t2 version using flatten()
> gets badly beaten for two reasons: (1) flatten
> always rebuilds the whole list, while ++ doesn't
> have to rebuild the RHS, and (2) it is implemented
> purely in erlang, while ++ is a BIF.
> 
> 
> BUT the big problem with running tests this way is
> that we don't get a fair picture of the GC component.
> Let's run a number of iterations for each:
> 
> do1(N, L,S) when N > 0 ->
>     t1(L,S),
>     do1(N-1,L,S);
> do1(_,_,_) ->
>     erlang:garbage_collect(),
>     ok.
> ....
> 
> do4(N, L,S) when N > 0 ->
>     t4(L,S),
>     do4(N-1,L,S);
> do4(_,_,_) ->
>     erlang:garbage_collect(),
>     ok.
> 
> Using L = ["abc","def","ghi","jkl","mno","pqr","stu
>            ","vxyz","123456789","987654321"]
> 
> do1(1000,L,"."):   7428 us
> do2(1000,L,"."):  12519 us
> do3(1000,L,"."):   5370 us
> do4(1000,L,"."):   7388 us
> 
> Using L2 = lists:duplicate(50,lists:duplicate(50,$a)). 
> 
> do1(1000,L2,"."): 1353298 us
> do2(1000,L2,"."):  417940 us
> do3(1000,L2,"."):  121037 us
> do4(1000,L2,"."):   72817 us
> 
> Here, t2 makes a comeback since it at least 
> avoids appending in the inner loop.
> 
> Conclusion: benchmarking is tricky. (:
> 
> BR,
> Ulf W
> 
>> -----Original Message-----
>> From: erlang-questions-bounces@REDACTED 
>> [mailto:erlang-questions-bounces@REDACTED] On Behalf Of 
>> Torbjorn Tornkvist
>> Sent: den 18 december 2006 09:34
>> To: erlang-questions@REDACTED
>> Subject: Re: [erlang-questions] join/2 function
>>
>>
>> This was discussed recently on the #erlang channel.
>> My take on this was:
>>
>>
>> t3(List, Sep) ->
>>      lists:foldr(fun(X,[]) -> X; (X,Acc) -> X++Sep++Acc end, 
>> "", List).
>>
>>
>> See the attached file for all three versions.
>>
>> 3> 
>> timer:tc(implode,t1,[["abc","def","ghi","jkl","mno","pqr","stu
>> ","vxyz","123456789","987654321"],
>> "."]).
>> {7,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
>> 4> 
>> timer:tc(implode,t1,[["abc","def","ghi","jkl","mno","pqr","stu
>> ","vxyz","123456789","987654321"],
>> "."]).
>> {7,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
>> 5> 
>> timer:tc(implode,t2,[["abc","def","ghi","jkl","mno","pqr","stu
>> ","vxyz","123456789","987654321"],
>> "."]).
>> {10,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
>> 6> 
>> timer:tc(implode,t2,[["abc","def","ghi","jkl","mno","pqr","stu
>> ","vxyz","123456789","987654321"],
>> "."]).
>> {10,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
>> 7> 
>> timer:tc(implode,t3,[["abc","def","ghi","jkl","mno","pqr","stu
>> ","vxyz","123456789","987654321"],
>> "."]).
>> {5,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
>> 8> 
>> timer:tc(implode,t3,[["abc","def","ghi","jkl","mno","pqr","stu
>> ","vxyz","123456789","987654321"],
>> "."]).
>> {5,"abc.def.ghi.jkl.mno.pqr.stu.vxyz.123456789.987654321"}
>>
>>
>> --Tobbe
>>
>>
>> Matthias Lang wrote:
>>> Hoan Ton-That writes:
>>>
>>>  > I think that the following function join should
>>>  > be included in the lists module.
>>>  > 
>>>  > join(Sep, List) ->
>>>  >   lists:foldl(fun(A, "") -> A; (A, Acc) -> Acc ++ Sep ++ 
>> A end, "", List).
>>> Your implementation of join is quadratic in the length of the list.
>>>
>>> Here's one O(N) way to do it:
>>>
>>>   lcjoin(Sep, [H|T]) ->
>>>       Joined = [ [Sep|X] || X <- T ],
>>>       lists:flatten([H|Joined]).
>>>
>>> In many cases the lists:flatten/1 call is unnecessary.
>>>
>>> Matthias
>>
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
> 

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: implode.erl
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20061218/02bc42be/attachment.ksh>