[erlang-questions] Resisting "noconnection" / Remote termination of nodes
Felix Lange
fjl@REDACTED
Wed Apr 25 14:20:49 CEST 2012
Hi,
I'd suggest using net_kernel:monitor_nodes/1 for this purpose.
Example:
shutdown(Nodes) ->
spawn_link(fun () ->
net_kernel:monitor_nodes(true),
lists:foreach(fun (Node) -> rpc:cast(Node, erlang,
halt, []) end, Nodes),
wait_dead(Nodes)
end).
wait_dead([]) ->
ok;
wait_dead([Node | Rest]) ->
receive
{nodedown, Node, _} ->
wait_dead(Rest)
after
10000 ->
error_logger:error_report([{shutdown_timeout, Node}]),
wait_dead(Rest)
end.
On Tue, 24 Apr 2012 13:52:18 +0200, Olivier BOUDEVILLE
<olivier.boudeville@REDACTED> wrote:
> Hi,
>
> For a more controlled overall termination of a distributed application, I
> try to shutdown synchronously a series of nodes, as properly and as in
> parallel as possible, in a non-OTP program. I imagine that using '[
> rpc:cast( N, erlang, halt, [] ) || N <- MyTargetNodes ]' and then waiting
> for them to be terminated is the best approach for that.
>
> As I want now these terminations to be synchronous (i.e. I want my
> terminate function to return only when all nodes are down for sure), I
> used to rely on checking their termination using net_adm:ping/1 (waiting
> for pong to become pang), but kept on getting (systematically)
> 'noconnection' errors (exceptions?), which do not seem to be catchable
> (at
> least not with a 'try .. catch T:E ->.. end' clause). This happens as
> soon
> as there is at least one node (which happens to be on the same host - of
> course it is not the local node from which that rpc:cast is triggered) to
> halt.
>
> I switched to looping on 'lists:member( Nodename, nodes() )' instead of
> ping (in both case with a proper waiting between checks), but I still get
> 'noconnection' errors. It looks like 'noconnection' is VM-level? As
> expected, commenting-out the rpc:cast/3 never leads to 'noconnection'.
>
> I feel I would need something like net_kernel:unconnect_node/1.
>
> My question now: how to deal gracefully with such a synchronous node
> shutdown and to resist to the (intended) loss of node(s)?
>
> Thanks in advance for any hint!
> Best regards,
>
> Olivier.
> ---------------------------
> Olivier Boudeville
>
> EDF R&D : 1, avenue du Général de Gaulle, 92140 Clamart, France
> Département SINETICS, groupe ASICS (I2A), bureau B-226
> Office : +33 1 47 65 59 58 / Mobile : +33 6 16 83 37 22 / Fax : +33 1 47
> 65 27 13
>
>
>
> Ce message et toutes les pièces jointes (ci-après le 'Message') sont
> établis à l'intention exclusive des destinataires et les informations
> qui y figurent sont strictement confidentielles. Toute utilisation de ce
> Message non conforme à sa destination, toute diffusion ou toute
> publication totale ou partielle, est interdite sauf autorisation
> expresse.
>
> Si vous n'êtes pas le destinataire de ce Message, il vous est interdit
> de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout
> ou partie. Si vous avez reçu ce Message par erreur, merci de le
> supprimer de votre système, ainsi que toutes ses copies, et de n'en
> garder aucune trace sur quelque support que ce soit. Nous vous
> remercions également d'en avertir immédiatement l'expéditeur par retour
> du message.
>
> Il est impossible de garantir que les communications par messagerie
> électronique arrivent en temps utile, sont sécurisées ou dénuées de
> toute erreur ou virus.
> ____________________________________________________
>
> This message and any attachments (the 'Message') are intended solely for
> the addressees. The information contained in this Message is
> confidential. Any use of information contained in this Message not in
> accord with its purpose, any dissemination or disclosure, either whole
> or partial, is prohibited except formal approval.
>
> If you are not the addressee, you may not copy, forward, disclose or use
> any part of it. If you have received this message in error, please
> delete it and all copies from your system and notify the sender
> immediately by return message.
>
> E-mail communication cannot be guaranteed to be timely secure, error or
> virus-free.
More information about the erlang-questions
mailing list