net_adm:names() failing

Per Hedeland per@REDACTED
Mon Oct 4 10:45:34 CEST 1999

>More investigation suggests that the problem is on the epmd
>server side.  I began to suspect that the problem was at the
>socket level and my research brought me to the SO_LINGER
>socket level option.  This option is supposed to control what
>happens when you close a socket that has pending writes on it.

SO_LINGER just makes the close() block until the pending data is sent
(allowing you to find out if the sending failed), without it the close()
returns immediately, but the kernel will still send the data (modulo
timeout, remote close, etc). Of course the Unixware socket/TCP
implementation could be broken in this respect, but most TCP-using
applications depend on this behaviour, so that's rather unlikely.

>  To check on this theory I put in a sleep(1) right before
>the close.  This worked, net_adm:names() now returned correct

>The thing that is strange about this is that my WinNT machine can
>get names off the Unixware epmd just fine without any changes.
>Putting in the sleep in epmd before closing the socket definitely 
>made it work from the Unixware erl shell though.

All this, together with the fact that you previously reported that the
Unixware system couldn't get names from a remote epmd either (or is this
no longer the case?), definitely points to a problem on the Unixware
"client side", I think. First thing I would do to debug it would be to
add a receive clause for "Other" in erl_epmd.erl/do_get_names to see if
an error message, or something else "unexpected", is received (arguably
there should already be a clause for {tcp_error, Socket, Reason}).

--Per Hedeland

More information about the erlang-questions mailing list