[erlang-questions] failures in supervised processes

Fri Oct 7 23:04:19 CEST 2011

# Tim Freeman 2011-10-07:
> Every document I have located refers to an exit signal being sent to the
> supervisor.  To me, this implies some participation of the process or the
> runtime

Yes, the runtime generates the exit signal whenever some process with
nonempty link-list dies (that's either ordinary links or monitors).

> on the remote node to indicate failure.  My question is about
> distributed processes that simply die/lose contact.
> 
> Let's say the remote process is on a computer whose network cable is suddenly
> removed.  What happens then, does the supervisor do its own active checks,
> what is the fallback mechanism there?

Normally you would have a supervisor only take care of local processes.
Supervising remote processes is probably technically possible, but isn't
something I'd contemplate for a production system (but maybe that's just
me).

Connection loss is a possibility your code needs to deal with, the right
response is too application specific to handle generically: queue incoming
requests hoping for better times? revert to backup peer? schedule reconnect
attempt and fail all requests meanwhile? etc. OTP is forthcoming in that it
gives you all the tools to build whatever solution is right for the system
at hand.

HTH,
	-- Jachym