robustness

Sat Jun 17 14:01:21 CEST 2000

Hal Snyder <hal@REDACTED> wrote:
>Erlang literature mentions linked processes to alert one process if
>another signals an exit, and heartbeat processes to react if the local
>node stops.

Yes, though of course in the latter case it has to be an external
process, such as implemented by the 'heart' module.

>Is there a common Erlang approach to dealing with loss of a remote
>node?

Well, how to deal with it is basically application-specific - to detect
it, you can besides the net_kernel:monitor_nodes/1 that Geoff mentioned
use the monitor_node/2 BIF - plus of course if your process is linked to
a process on a remote node, it will receive an EXIT signal also if the
node "dies" (i.e. it is synthesized locally).

>How does a set of networked nodes know, for example, if one of them
>suddenly ceases to function (loses power, etc.) without sending any
>exit signals?

It uses a mechanism similar to TCP "keepalive" - i.e. "tick" messages
are sent periodically between connected nodes if there is no other
traffic, and lack of reception of any messages from a node for a longer
period is taken as an indication that it is dead. See e.g. the
description of net_ticktime in the kernel application documentation.
(There's a typo there, s/anything/nothing/.:-)

--Per Hedeland
per@REDACTED