[erlang-questions] Strange performance issue - advice needed

Sat Apr 14 19:14:16 CEST 2012

Can't read the code well on my phone but if you're using line oriented io then that tends to suck. 

I would suggest reading http://erlang.org/pipermail/erlang-questions/2012-March/065374.html as well.


On 14 Apr 2012, at 16:18, Ian <hobson42@REDACTED> wrote:

> Hi all,
> Nubie here, learning Erlang. Having written some awfully slow code in new languages before, I am keen to write speedy code this time.
> I am writing a solution to the telegram problem. In the code below, tgpack reads words, and builds lines, while tgwriter receives those lines and writes the result to a file of text. Other code, not shown, generates words and writes them as binaries, to tgpack.
> I am getting some very strange timings. When I ran it with 50 characters per line, it too 18.8 seconds to run. So I doubled the length of the line, thus halving the number of lines. Same input file.
> If the time was dominate by the line count then the time taken would be about half. If it was dominated by the line length then it would be about double. I did not think I would get a timing outside this range (9-40 seconds).
> It took 3.6 seconds!
> So I doubled the line length again, and at 200 characters per line, it took 0.6 seconds!
> I tried 25 chars per line, it took 122 seconds.
> In every case it was reading the same 1.7MB file. I also repeated the tests and the timings are constant (+/- 3%).
> So it appears that short lines are terribly inefficient. I want to understand why? Can anyone spot it?
> Thanks
> Ian
> The code for tgpack
> -module(tgpack).
> -export([pack/2]).
> %% tgpack - read words, builds and write lines to writer,
> %% where lines are as long as possible and less than Length
> %% Upon reading <<>> write it to close file, and quit.
> pack(Length,Writer) ->
>    receive
> <<>> -> Writer ! <<>>;  % handle empty file
>        Word -> buldlist(Length,byte_size(Word),Writer,[Word])
>    end.
> % buildList
> % Length is max length of line
> % Size is length of line so far
> % Writer is file writing process
> % List is list of binaries built up for current line in reverse order
> buldlist(Length,Size,Writer,List) ->
>    receive
> <<>> ->
>            Writer ! binaryFromList(List), % send last line.
>            Writer ! <<>>;   %  close the output
>        Word ->
>            S = Size + 1 + byte_size(Word),
>            case S >= Length of
>                true -> % write old and start next line
>                    Writer ! binaryFromList(List),
>                    buldlist(Length,byte_size(Word),Writer,[Word]);
>                false ->  % add word to current
>                    buldlist(Length,S,Writer,[Word|List] )
>            end
>    end.
> the code for tgwriter
> -module(tgwriter).
> -export([writer/1]).
> %% writer - reads lines and writes them to Filename until eof
> %%    eof is a zero length binary
> writer(Filename) ->
>    {ok, Fh} = file:open(Filename,[write,{delayed_write, 8096, 1000}]),
>    writemore(Fh).
> writemore(Fh) ->
>    receive
> <<>> -> file:close(Fh);
>        Msg ->    % write line
>            file:write(Fh, Msg),
>            file:write(Fh, "\n"),
>            writemore(Fh)
>    end.
