[erlang-questions] input too fast

Fredrik Svahn <>
Fri Jun 29 14:45:51 CEST 2007


Sorry for double posting, it seems I have misconfigured something at
trapexit...

Fredrik Svahn wrote:
I have also been frustrated by the way the io operations work when
attempting to speed up a few of the example programs for the language
shootout. The reverse-complement program for instance (which is approx. 60
times slower than the corresponding c program) spends 80% of its time
reading from stdio, and I assume writing out the results are quite costly
too.

Just for fun I made a small patch to efile to allow for stdin/stdout to be
opened as files. It probably has a lot of nasty side effects which I cannot
even imagine in the worst of my nightmares, but the results for reading and
writing are stunning. The "file" approach clocks in at 0.26 seconds for
reading a large file from stdin and writing it to stdout. Corresponding
results for a port program is 1.2 seconds with the normal io approach
measuring in at 16.9 seconds.

I guess reading from stdin is not much of a problem for most Erlang
applications which are supposed to be robust scalable systems staying up for
years. I also think that this has been discussed before, probably at great
length, although I cannot find any relevant posts at the moment. But now
with escript maybe it might be a bit more interesting to have fast io
operations for stdin/stdout, at least for unix systems?

I haven't looked at memory consumption, yet, but I expect the result should
be the same as for Ulf, i.e. port programs build up large heaps if they
cannot handle the messages really really really fast, while the file and
normal io approach should not really consume much more memory than the
buffer.

BR /Fredrik

Test-program:

-module(io_test).
-export([file/0,port/0,normal/0,fileio/0,portio/0,normalio/0]).
-define(bufsize, 2048).

file()-> io:format("~n~p~n",[timer:tc(?MODULE, fileio, [])]), halt().
port()-> io:format("~n~p~n",[timer:tc(?MODULE, portio, [])]), halt().
normal()-> io:format("~n~p~n",[timer:tc(?MODULE, normalio, [])]), halt().

fileio()->
    {ok,StdIn}=file:open("<stdin>",[raw, binary, read]),
    {ok,StdOut}=file:open("<stdout>",[raw, binary, write]),
    fileio(StdIn, StdOut).

fileio(StdIn, StdOut) ->
    case file:read(StdIn,?bufsize) of
   eof -> ok;
   {ok, Data} ->
       file:write(StdOut, Data),
       fileio(StdIn, StdOut)
    end.

portio()->
    Port=open_port({fd, 0, 1},[eof]),
    portio(Port),
    port_close(Port).

portio(Port)->
    receive
   {Port, {data, Data}} ->
       port_command(Port, Data),
       portio(Port);
   {_Port, eof} -> ok
    end.

normalio() ->
    case io:get_chars('',?bufsize) of
   eof -> ok;
   Data ->
       io:put_chars(Data),
       normalio()
    end.



Command-lines:

$ erl -noinput -run io_test file < txt  > tmp-file ; tail -n 1 tmp-file
{259951,ok}
$ erl -noinput -run io_test port < txt  > tmp-port ; tail -n 1 tmp-port
{1193521,true}
$ erl -noinput -noshell -run io_test normal < txt  > tmp-normal ; tail -n 1
tmp-normal
{16946068,ok}


Patch for unix on R11B-5:
diff ./erts/emulator/drivers/unix/unix_efile.c
./erts/emulator/drivers/unix/unix_efile.c.old
781,789c781
<
<     if (strcmp(name, "<stdin>") == 0) {
<               fd = 0;
<     } else if (strcmp(name, "<stdout>") == 0) {
<       fd = 1;
<     } else {
<       fd = open(name, mode, FILE_MODE);
<     }
<
---
>     fd = open(name, mode, FILE_MODE);




On 6/26/07, Ulf Wiger (TN/EAB) < > wrote:
>
>
> I submitted a sum-file entry to the shootout, which worked
> nicely in my environment(*), but failed miserably in the
> official benchmark.
>
> *http://shootout.alioth.debian.org/gp4/benchmark.php?test=sumcol&lang=hipe&id=2
> *<http://shootout.alioth.debian.org/gp4/benchmark.php?test=sumcol&lang=hipe&id=2>
>
> It uses the (admittedly undocumented) command-line flag for
> installing a custom user process, and opens stdin in line-
> oriented mode.
>
> The problem is that it runs out of memory. As far as I can make
> out, it's because the emulator chops up lines and sends them
> to the process at such a high rate that, even though the
> process is in a tight loop and doing minimal work on each item,
> it can't stop the message queue from building up.
>
> This has disastrous effects when the input file is large enough.
>
> I realise that the feature is undocumented, but perhaps it's still
> a valid point - some sort of generic flow-control on ports,
> similar to the {active, bool()} on sockets, would be just the
> thing here.
>
> (*) I realise that I tested it in an NFS-mounted disk (on a clearcase-
> enabled file system at that). This might have given the port
> sufficient flow control that the program lasted a bit longer, at least.
>
> _______________________________________________
> erlang-questions mailing list
> 
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20070629/e12196c2/attachment.html>


More information about the erlang-questions mailing list