[erlang-questions] Strange performance issue - advice needed

Ian <>
Sat Apr 14 17:18:19 CEST 2012


Hi all,

Nubie here, learning Erlang. Having written some awfully slow code in 
new languages before, I am keen to write speedy code this time.

I am writing a solution to the telegram problem. In the code below, 
tgpack reads words, and builds lines, while tgwriter receives those 
lines and writes the result to a file of text. Other code, not shown, 
generates words and writes them as binaries, to tgpack.

I am getting some very strange timings. When I ran it with 50 characters 
per line, it too 18.8 seconds to run. So I doubled the length of the 
line, thus halving the number of lines. Same input file.

If the time was dominate by the line count then the time taken would be 
about half. If it was dominated by the line length then it would be 
about double. I did not think I would get a timing outside this range 
(9-40 seconds).

It took 3.6 seconds!

So I doubled the line length again, and at 200 characters per line, it 
took 0.6 seconds!

I tried 25 chars per line, it took 122 seconds.

In every case it was reading the same 1.7MB file. I also repeated the 
tests and the timings are constant (+/- 3%).

So it appears that short lines are terribly inefficient. I want to 
understand why? Can anyone spot it?

Thanks

Ian

The code for tgpack
-module(tgpack).
-export([pack/2]).
%% tgpack - read words, builds and write lines to writer,
%% where lines are as long as possible and less than Length
%% Upon reading <<>> write it to close file, and quit.
pack(Length,Writer) ->
     receive
<<>> -> Writer ! <<>>;  % handle empty file
         Word -> buldlist(Length,byte_size(Word),Writer,[Word])
     end.

% buildList
% Length is max length of line
% Size is length of line so far
% Writer is file writing process
% List is list of binaries built up for current line in reverse order
buldlist(Length,Size,Writer,List) ->
     receive
<<>> ->
             Writer ! binaryFromList(List), % send last line.
             Writer ! <<>>;   %  close the output
         Word ->
             S = Size + 1 + byte_size(Word),
             case S >= Length of
                 true -> % write old and start next line
                     Writer ! binaryFromList(List),
                     buldlist(Length,byte_size(Word),Writer,[Word]);
                 false ->  % add word to current
                     buldlist(Length,S,Writer,[Word|List] )
             end
     end.

the code for tgwriter
-module(tgwriter).
-export([writer/1]).
%% writer - reads lines and writes them to Filename until eof
%%    eof is a zero length binary
writer(Filename) ->
     {ok, Fh} = file:open(Filename,[write,{delayed_write, 8096, 1000}]),
     writemore(Fh).

writemore(Fh) ->
     receive
<<>> -> file:close(Fh);
         Msg ->    % write line
             file:write(Fh, Msg),
             file:write(Fh, "\n"),
             writemore(Fh)
     end.




More information about the erlang-questions mailing list