[erlang-questions] Resisting "noconnection" / Remote termination of nodes

Felix Lange fjl@REDACTED
Wed Apr 25 14:20:49 CEST 2012


Hi,

I'd suggest using net_kernel:monitor_nodes/1 for this purpose.

Example:

shutdown(Nodes) ->
     spawn_link(fun () ->
                        net_kernel:monitor_nodes(true),
                        lists:foreach(fun (Node) -> rpc:cast(Node, erlang,  
halt, []) end, Nodes),
                        wait_dead(Nodes)
                end).

wait_dead([]) ->
     ok;
wait_dead([Node | Rest]) ->
     receive
     	{nodedown, Node, _} ->
             wait_dead(Rest)
     after
         10000 ->
             error_logger:error_report([{shutdown_timeout, Node}]),
             wait_dead(Rest)
     end.

On Tue, 24 Apr 2012 13:52:18 +0200, Olivier BOUDEVILLE  
<olivier.boudeville@REDACTED> wrote:

> Hi,
>
> For a more controlled overall termination of a distributed application, I
> try to shutdown synchronously a series of nodes, as properly and as in
> parallel as possible, in a non-OTP program. I imagine that using '[
> rpc:cast( N, erlang, halt, [] ) || N <- MyTargetNodes ]' and then waiting
> for them to be terminated is the best approach for that.
>
> As I want now these terminations to be synchronous (i.e. I want my
> terminate function to return only when all nodes are down for sure), I
> used to rely on checking their termination using net_adm:ping/1 (waiting
> for pong to become pang), but kept on getting (systematically)
> 'noconnection' errors (exceptions?), which do not seem to be catchable  
> (at
> least not with a 'try .. catch T:E ->.. end' clause). This happens as  
> soon
> as there is at least one node (which happens to be on the same host - of
> course it is not the local node from which that rpc:cast is triggered) to
> halt.
>
> I switched to looping on 'lists:member( Nodename, nodes() )' instead of
> ping (in both case with a proper waiting between checks), but I still get
> 'noconnection' errors. It looks like 'noconnection' is VM-level? As
> expected, commenting-out the rpc:cast/3 never leads to 'noconnection'.
>
> I feel I would need something like net_kernel:unconnect_node/1.
>
> My question now: how to deal gracefully with such a synchronous node
> shutdown and to resist to the (intended) loss of node(s)?
>
> Thanks in advance for any hint!
> Best regards,
>
> Olivier.
> ---------------------------
> Olivier Boudeville
>
> EDF R&D : 1, avenue du Général de Gaulle, 92140 Clamart, France
> Département SINETICS, groupe ASICS (I2A), bureau B-226
> Office : +33 1 47 65 59 58 / Mobile : +33 6 16 83 37 22 / Fax : +33 1 47
> 65 27 13
>
>
>
> Ce message et toutes les pièces jointes (ci-après le 'Message') sont  
> établis à l'intention exclusive des destinataires et les informations  
> qui y figurent sont strictement confidentielles. Toute utilisation de ce  
> Message non conforme à sa destination, toute diffusion ou toute  
> publication totale ou partielle, est interdite sauf autorisation  
> expresse.
>
> Si vous n'êtes pas le destinataire de ce Message, il vous est interdit  
> de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout  
> ou partie. Si vous avez reçu ce Message par erreur, merci de le  
> supprimer de votre système, ainsi que toutes ses copies, et de n'en  
> garder aucune trace sur quelque support que ce soit. Nous vous  
> remercions également d'en avertir immédiatement l'expéditeur par retour  
> du message.
>
> Il est impossible de garantir que les communications par messagerie  
> électronique arrivent en temps utile, sont sécurisées ou dénuées de  
> toute erreur ou virus.
> ____________________________________________________
>
> This message and any attachments (the 'Message') are intended solely for  
> the addressees. The information contained in this Message is  
> confidential. Any use of information contained in this Message not in  
> accord with its purpose, any dissemination or disclosure, either whole  
> or partial, is prohibited except formal approval.
>
> If you are not the addressee, you may not copy, forward, disclose or use  
> any part of it. If you have received this message in error, please  
> delete it and all copies from your system and notify the sender  
> immediately by return message.
>
> E-mail communication cannot be guaranteed to be timely secure, error or  
> virus-free.



More information about the erlang-questions mailing list