[erlang-questions] Strange performance issue - advice needed
Ian
hobson42@REDACTED
Sat Apr 14 17:18:19 CEST 2012
Hi all,
Nubie here, learning Erlang. Having written some awfully slow code in
new languages before, I am keen to write speedy code this time.
I am writing a solution to the telegram problem. In the code below,
tgpack reads words, and builds lines, while tgwriter receives those
lines and writes the result to a file of text. Other code, not shown,
generates words and writes them as binaries, to tgpack.
I am getting some very strange timings. When I ran it with 50 characters
per line, it too 18.8 seconds to run. So I doubled the length of the
line, thus halving the number of lines. Same input file.
If the time was dominate by the line count then the time taken would be
about half. If it was dominated by the line length then it would be
about double. I did not think I would get a timing outside this range
(9-40 seconds).
It took 3.6 seconds!
So I doubled the line length again, and at 200 characters per line, it
took 0.6 seconds!
I tried 25 chars per line, it took 122 seconds.
In every case it was reading the same 1.7MB file. I also repeated the
tests and the timings are constant (+/- 3%).
So it appears that short lines are terribly inefficient. I want to
understand why? Can anyone spot it?
Thanks
Ian
The code for tgpack
-module(tgpack).
-export([pack/2]).
%% tgpack - read words, builds and write lines to writer,
%% where lines are as long as possible and less than Length
%% Upon reading <<>> write it to close file, and quit.
pack(Length,Writer) ->
receive
<<>> -> Writer ! <<>>; % handle empty file
Word -> buldlist(Length,byte_size(Word),Writer,[Word])
end.
% buildList
% Length is max length of line
% Size is length of line so far
% Writer is file writing process
% List is list of binaries built up for current line in reverse order
buldlist(Length,Size,Writer,List) ->
receive
<<>> ->
Writer ! binaryFromList(List), % send last line.
Writer ! <<>>; % close the output
Word ->
S = Size + 1 + byte_size(Word),
case S >= Length of
true -> % write old and start next line
Writer ! binaryFromList(List),
buldlist(Length,byte_size(Word),Writer,[Word]);
false -> % add word to current
buldlist(Length,S,Writer,[Word|List] )
end
end.
the code for tgwriter
-module(tgwriter).
-export([writer/1]).
%% writer - reads lines and writes them to Filename until eof
%% eof is a zero length binary
writer(Filename) ->
{ok, Fh} = file:open(Filename,[write,{delayed_write, 8096, 1000}]),
writemore(Fh).
writemore(Fh) ->
receive
<<>> -> file:close(Fh);
Msg -> % write line
file:write(Fh, Msg),
file:write(Fh, "\n"),
writemore(Fh)
end.
More information about the erlang-questions
mailing list