Determining if process is alive on a different node

Ulf Wiger (AL/EAB) <>
Tue Aug 16 11:40:22 CEST 2005


Agreed. There should really be a housekeeping process
that removes dead pids from the tables. This process
could monitor the other node, and remove all pids that
belong to that node if it dies.

Per Hedeland wrote:
> 
> Well, the mechanism does cover that "simple example" (and
> that's basically what it was designed to do), but for
> anyone that really needs to rely on old Pids never
> matching new processes, it's probably a good idea to
> make sure what guarantees are actually provided - 
> e.g. when the remote host does a full reboot, or the
> Pids are "very" old.

Agreed. There should really be some housekeeping process
that removes dead pids from the table(s) if the other 
node goes down. If this is done by a background process,
there is a race condition to consider.

If the other node restarts quickly, the creation number
will make sure that there is no mixup; if the other node
does a full reboot, there is ample time to remove the 
pids (at least assuming that you are running Erlang on 
normal machines)

To be really sure, you can turn change the dist_auto_connect
option in the kernel application. If this is set to 'never',
nodes will not automatically re-connect, and you have full
control over the time between a node dying and it reconnecting.

/Uffe

> -----Original Message-----
> From: Per Hedeland [mailto:]
> Sent: den 16 augusti 2005 09:43
> To: ; ; Ulf Wiger (AL/EAB)
> Subject: RE: Determining if process is alive on a different node
> 
> 
> "Ulf Wiger (AL/EAB)" <> wrote:
> >
> >Klacke is right, as this simple example shows:
> 
> Well, the mechanism does cover that "simple example" (and that's
> basically what it was designed to do), but for anyone that 
> really needs
> to rely on old Pids never matching new processes, it's probably a good
> idea to make sure what guarantees are actually provided - 
> e.g. when the
> remote host does a full reboot, or the Pids are "very" old.
> 
> In the old days the "creation number" was only two bits, with state
> maintained by epmd - and looking at current epmd source, it seems that
> *it* does the same today (actually only 1..3 is used). But maybe there
> is more magic on the Erlang node side these days, I haven't checked.
> 
> --Per Hedeland
> 



More information about the erlang-questions mailing list