[erlang-questions] surprise! binary vs list in file parse.

Pablo Polvorin pablo.polvorin@REDACTED
Thu Jun 26 22:01:35 CEST 2008


On my machine, which hasn't the R12B improvements, I'm getting the same
results than you.
But after a little rewrite, things go better:

2> timer_avg:tc(act,parse,["test.txt"],1000).
Max: 26200
Min: 3066
Avg: 3853.10
ok
3> timer_avg:tc(act_2,parse,["test.txt"],1000).
Max: 28206
Min: 5495
Avg: 5968.27
ok
4> timer_avg:tc(act_3,parse,["test.txt"],1000).
Max: 19227
Min: 3045
Avg: 3215.01
ok

the code I used for act_3 is like this:

parse(Bin) when is_binary(Bin) ->
        parse(Bin, 0,[], []).


parse(Bin,CurrentOffset,CurrentLine,Lines) ->
    case Bin of
        <<Field:CurrentOffset/binary, $\,,Rest/binary>> ->
            parse(Rest,0,[Field|CurrentLine],Lines);
        <<Field:CurrentOffset/binary,$\n,Rest/binary>> ->
            parse(Rest,0,[],[lists:reverse([Field|CurrentLine])|Lines]);
        <<_Field:CurrentOffset/binary,_Char,_Rest/binary>> ->
            parse(Bin,CurrentOffset+1,CurrentLine,Lines);
        <<>> ->
            {ok,lists:reverse(Lines)};
        _ ->
            {error,bad_file}
    end.

explicitly keep track of the current offset to avoid copying binaries on
each
append.

I think that on newer erlang releases, the performance of act_2 should be
similar, but I don't
have it installed here to test. See comments on
http://www.erlang.org/pipermail/erlang-questions/2008-June/036166.html

2008/6/26 litao cheng <litaocheng@REDACTED>:

> yes, I retry the test like you say.
> the result is same.
> thank you!
>
> 2008/6/26 Vlad Dumitrescu <vladdu55@REDACTED>:
>
> Hi
>>
>> 2008/6/26 litao cheng <litaocheng@REDACTED>:
>>
>> I had read joel's article:Parsing text and binary files with Erlang. In
>>> the article, the author show how to parse a comma-delimited text file.
>>> I write the code for practice, the module source is act.erl.  By the way,
>>> I see in the bottom of joel' article, a buddy give a comment, he says, he
>>> use binary to instead of some list, It seems so efficient, so I write the
>>> second code, it's act_2.erl.
>>>
>>> Finally,  I want to test how much the act_2 faster than act. So suprise,
>>> In my Compute, The result is :
>>> 96> timer_avg:tc(act, parse, ["test.txt"], 100).
>>> Max: 15991
>>> Min: 1
>>> Avg: 2339.05
>>>
>>> 100> timer_avg:tc(act_2, parse, ["test.txt"], 100).
>>> Max: 15997
>>> Min: 1
>>> Avg: 4839.03
>>>
>>> the timer_avg is a moudule  evaluates apply(Module, Function, Arguments)N times and measures the elapsed real time, about Max, Min, Avg. (use
>>> timer:tc/3).
>>>
>>> who will give me some explain?
>>>
>>
>> You might want to retry the tests, starting them instead with
>> spawn(timer_avg, tc, [act, parse, ["test.txt"], 100]).
>> spawn(timer_avg, tc, [act_2, parse, ["test.txt"], 100]).
>>
>> This way you don't get any random garbage collections to interfere.
>>
>> regards,
>> Vlad
>>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>



-- 
--
pablo
http://ppolv.wordpress.com
----
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080626/0ed2b43b/attachment.htm>


More information about the erlang-questions mailing list