[erlang-questions] Resisting "noconnection" / Remote termination of nodes
Valentin Nechayev
netch@REDACTED
Thu Apr 26 11:38:21 CEST 2012
Tue, Apr 24, 2012 at 13:52:18, olivier.boudeville wrote about "[erlang-questions] Resisting "noconnection" / Remote termination of nodes":
> As I want now these terminations to be synchronous (i.e. I want my
> terminate function to return only when all nodes are down for sure), I
> used to rely on checking their termination using net_adm:ping/1 (waiting
> for pong to become pang), but kept on getting (systematically)
> 'noconnection' errors (exceptions?), which do not seem to be catchable (at
> least not with a 'try .. catch T:E ->.. end' clause).
Seems this is good case for erlang:monitor_node(). Initially, check each
node from the list and subscribe to nodedown messages using
monitor_node() if it is shown alive. Then make multicast halt request
and loop around nodedown messages, dropping nodes from list on each one.
Exit on empty list.
Also net_kernel:monitor_nodes() provide similar ability.
The only side issue I see is that it tries to connect to node if there
is no connection yet, so you one can occur a potential race if some
another agent performs similar action. If this is your case, separate
node control to own manager process.
> I feel I would need something like net_kernel:unconnect_node/1.
It's erlang:disconnect_node/1, but I doubt it is useful for your goal.
> My question now: how to deal gracefully with such a synchronous node
> shutdown and to resist to the (intended) loss of node(s)?
Seems something is unclear in your description so feel free to
reformulate it.
-netch-
More information about the erlang-questions
mailing list