epmd death

Kent Boortz kent@REDACTED
Sat Sep 4 19:36:30 CEST 2004


"Ernie Makris" <ernie.makris@REDACTED> writes:
> I have a concern that if epmd for some reason crashes, then my distributed
> nodes
> can't contact each other even if a new epmd is started. I have two
> distributed nodes
> setup on the same machine. I then kill epmd and then try to have one node
> rpc to another,
> which gives me a {badrpc,nodedown}.
>
> I took a look at net_kernel and erl_epmd and there doesn't look like there
> is a reconnection
> feature. Does anyone ever have any problems of this happening? Is there any
> workaround?
>
> Of course I could setup a separate socket and communicate through that, but
> it defeats the purpose
> of distributed erlang:(

Epmd is written to be small and simple to avoid problems with it
crashing. There have been very few bug reports (only one serious that
I can remember) after the code was cleaned up and test cases where
added many years back.

But it can of course happen (*). The Erlang node keeps a socket
connection to epmd so it should not be that hard for an Erlang node to
detect that epmd has died and try to restart it. For compatibility
with WxWorks, and other OS'es that don't detect a close on a socket,
there should probably be some sort of periodic ping between the node
and epmd. The only complication with the restarting is that there may
be several nodes on the same machine that all try to restart epmd at
the same time. But this is not that hard to handle,

kent

(*) There is known that there have been product setups that use "in
place" updates of the epmd binary. If you upgrade the binary for a
running program, the program will die on some (most/all?) Unix'es. The
program may not die directly when the binary is updated, it may take
some time until the OS runs into problems because of the original
binary being missing. Other than that there are no know problems with
epmd that I'm aware of. Except the fact that a simple "epmd -kill" by
any user on a machine will kill epmd ;-)

-- 
Kent Boortz, Senior Software Developer
MySQL AB, www.mysql.com
Office: +46 8 590 910 63
Mobile: +46 70 279 11 71



More information about the erlang-questions mailing list