[erlang-questions] Resisting "noconnection" / Remote termination of nodes

Felix Lange fjl@REDACTED
Wed Apr 25 14:26:06 CEST 2012


Please note that shutdown/1 is not synchronous in my example.

On Wed, 25 Apr 2012 14:20:49 +0200, Felix Lange <fjl@REDACTED> wrote:

> Hi,
>
> I'd suggest using net_kernel:monitor_nodes/1 for this purpose.
>
> Example:
>
> shutdown(Nodes) ->
>      spawn_link(fun () ->
>                         net_kernel:monitor_nodes(true),
>                         lists:foreach(fun (Node) -> rpc:cast(Node,  
> erlang, halt, []) end, Nodes),
>                         wait_dead(Nodes)
>                 end).
>
> wait_dead([]) ->
>      ok;
> wait_dead([Node | Rest]) ->
>      receive
>      	{nodedown, Node, _} ->
>              wait_dead(Rest)
>      after
>          10000 ->
>              error_logger:error_report([{shutdown_timeout, Node}]),
>              wait_dead(Rest)
>      end.
>
> On Tue, 24 Apr 2012 13:52:18 +0200, Olivier BOUDEVILLE  
> <olivier.boudeville@REDACTED> wrote:
>
>> Hi,
>>
>> For a more controlled overall termination of a distributed application,  
>> I
>> try to shutdown synchronously a series of nodes, as properly and as in
>> parallel as possible, in a non-OTP program. I imagine that using '[
>> rpc:cast( N, erlang, halt, [] ) || N <- MyTargetNodes ]' and then  
>> waiting
>> for them to be terminated is the best approach for that.
>>
>> As I want now these terminations to be synchronous (i.e. I want my
>> terminate function to return only when all nodes are down for sure), I
>> used to rely on checking their termination using net_adm:ping/1 (waiting
>> for pong to become pang), but kept on getting (systematically)
>> 'noconnection' errors (exceptions?), which do not seem to be catchable  
>> (at
>> least not with a 'try .. catch T:E ->.. end' clause). This happens as  
>> soon
>> as there is at least one node (which happens to be on the same host - of
>> course it is not the local node from which that rpc:cast is triggered)  
>> to
>> halt.
>>
>> I switched to looping on 'lists:member( Nodename, nodes() )' instead of
>> ping (in both case with a proper waiting between checks), but I still  
>> get
>> 'noconnection' errors. It looks like 'noconnection' is VM-level? As
>> expected, commenting-out the rpc:cast/3 never leads to 'noconnection'.
>>
>> I feel I would need something like net_kernel:unconnect_node/1.
>>
>> My question now: how to deal gracefully with such a synchronous node
>> shutdown and to resist to the (intended) loss of node(s)?
>>
>> Thanks in advance for any hint!
>> Best regards,
>>
>> Olivier.
>> ---------------------------
>> Olivier Boudeville
>>
>> EDF R&D : 1, avenue du Général de Gaulle, 92140 Clamart, France
>> Département SINETICS, groupe ASICS (I2A), bureau B-226
>> Office : +33 1 47 65 59 58 / Mobile : +33 6 16 83 37 22 / Fax : +33 1 47
>> 65 27 13
>>
>>
>>
>> Ce message et toutes les pièces jointes (ci-après le 'Message') sont  
>> établis à l'intention exclusive des destinataires et les informations  
>> qui y figurent sont strictement confidentielles. Toute utilisation de  
>> ce Message non conforme à sa destination, toute diffusion ou toute  
>> publication totale ou partielle, est interdite sauf autorisation  
>> expresse.
>>
>> Si vous n'êtes pas le destinataire de ce Message, il vous est interdit  
>> de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout  
>> ou partie. Si vous avez reçu ce Message par erreur, merci de le  
>> supprimer de votre système, ainsi que toutes ses copies, et de n'en  
>> garder aucune trace sur quelque support que ce soit. Nous vous  
>> remercions également d'en avertir immédiatement l'expéditeur par retour  
>> du message.
>>
>> Il est impossible de garantir que les communications par messagerie  
>> électronique arrivent en temps utile, sont sécurisées ou dénuées de  
>> toute erreur ou virus.
>> ____________________________________________________
>>
>> This message and any attachments (the 'Message') are intended solely  
>> for the addressees. The information contained in this Message is  
>> confidential. Any use of information contained in this Message not in  
>> accord with its purpose, any dissemination or disclosure, either whole  
>> or partial, is prohibited except formal approval.
>>
>> If you are not the addressee, you may not copy, forward, disclose or  
>> use any part of it. If you have received this message in error, please  
>> delete it and all copies from your system and notify the sender  
>> immediately by return message.
>>
>> E-mail communication cannot be guaranteed to be timely secure, error or  
>> virus-free.
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions



More information about the erlang-questions mailing list