Overlong wait in receive after

Robert Raschke <>
Fri Feb 19 18:33:36 CET 2010


Hi,

has anyone encountered spurious overlong waits using receive ... after? That
is, the code says "after 1000 ->" but it's way longer than 1 second before
it gets there.

My code starts a Java jinterface node, the Java code creates the node,
creates an mbox, and waits a minute for a message to appear and sends one
back.

The Erlang code starts the Java as a port program, and starts sending ok
messages to the registered mbox of the Java program spaced 1 second apart.
My logging indicates that the sending of the messages is actually spaced
somewhere between 5 and 12 seconds apart!

This is on a Windows 2003 Server running R12B-5.

In Erlang (started with -sname "") this is roughly what's
happening (apologies for typos):

run() ->
    Node_Name = "",
    Mbox = {box, list_to_atom(Node_Name)},
    Port_Pid = spawn_link(?MODULE, run_port_program, [Node_Name,
atom_to_list(erlang:get_cookie())]),
    {ok, Java_Pid} = shake_hands(Mbox, 0).

shake_hands(Mbox, N) when N < 50 ->
    error_logger:info_report([{module, ?MODULE}, {handshake, N}, {mbox,
Mbox}]),
    Mbox ! {self(), ok},
    receive
        {ok, Pid} ->
            {ok, Pid}
    after 1000 ->
        error_logger:error_report([{module, ?MODULE}, {'handshake timeout',
N}]),
        shake_hands(Mbox, N+1)
    end;
shake_hands(_Mbox, N) when N >= 50 ->
    timeout.

run_port_program(Node_Name, Cookie) ->
    Cmd = lists:flatten(["java MyCode \"", Node_Name, "\" \"", Cookie,
"\""]),
    P = open_port({spawn, Cmd}, [stream, {line, 100}, exit_status]),
    loop(P, Node_Name, [], []).

loop(Port, Name, Response, Line) ->
    %% handle the messages coming from Java's stdout ...


And the Java bit is doing:

String node_name = args[0];
String cookie = args[1];
OtpNode node = new OtpNode(node_name, cookie);
OtpMbox mbox = node.createMbox("box");
OtpErlangTuple msg = (OtpErlangTuple) mbox.receive(60000);


And my log contains entries where the time between the 'handshake' and the
'handhsake timeout' is somewhere between 5 and 12 seconds! Most of the times
the Java mbox.receive(60000) ends up timing out, after which, suddenly, the
log messages end up being 1 second spaced!

All extremely puzzling. To me it looks like something on the machine is
interfering with low-level comms. But that is currently just a hunch.

Any pointers or ideas on where to go hunting are greatly appreciated. I'll
probably try to create a standalone setup next, to see if I can find some
minimal code to exhibit this behaviour.

Thanks,
Robby


More information about the erlang-questions mailing list