[erlang-questions] Resisting "noconnection" / Remote termination of nodes

Olivier BOUDEVILLE olivier.boudeville@REDACTED
Thu Apr 26 17:30:56 CEST 2012


Hi Kannan,

Indeed there was already a master node which, once the termination was 
agreed upon (consensus is easy to obtain in my case), was enforcing it, 
shutting down distributed services synchronously and in the right order.

So the tear-down phase was fully planned, except that a 'noconnection' 
error suggested that I had at least one reckless process that was 
attempting to reach an already halted node.

My question was whether 'noconnection' was VM-level and uncatchable (not 
directly triggered by a specific operation that I could have wrapped in a 
try/catch clause) and whether we could have a little more information than 
just 'noconnection' (source/target node/pid). 

Anyway replacing rpc:cast( N, erlang, halt, [] ) by a direct halt() 
performed on a process already running on each node did the trick.

Thanks,
Best regards,

Olivier Boudeville.
---------------------------
Olivier Boudeville

EDF R&D : 1, avenue du Général de Gaulle, 92140 Clamart, France
Département SINETICS, groupe ASICS (I2A), bureau B-226
Office : +33 1 47 65 59 58 / Mobile : +33 6 16 83 37 22 / Fax : +33 1 47 
65 27 13



vasdeveloper@REDACTED 
26/04/2012 12:34

A
olivier.boudeville@REDACTED
cc
erlang-questions@REDACTED
Objet
Re: [erlang-questions] Resisting "noconnection" / Remote termination of 
nodes






Hi Olivier,

In distributed computing, we see the collective effort of individual -- 
but not independent -- nodes. Communicating nodes, at any point in time, 
need to be connected and should have the knowledge of the other, including 
the availability status. The OAM node, or leader, will have the total 
picture of all the nodes.

You have to look at the bigger picture of the whole system, when it comes 
to stopping a distributed system. The following steps are generally taken 
to tear down a system.

* Signal all the nodes of a forthcoming shutdown request
* Stop accepting new requests
* Finish servicing the accepted requests
* Do the stock taking and clean-up
* Check the readiness of all the nodes for shutdown
* Then call shutdown on all the nodes.

Here, I am looking from top into the system. The shutdown process will be 
generally coordinated by a single process.

An ad-hock shutdown is more scary to me in a production environment.

Kind Regards,
Kannan.




On Tue, Apr 24, 2012 at 5:22 PM, Olivier BOUDEVILLE <
olivier.boudeville@REDACTED> wrote:

Hi, 

For a more controlled overall termination of a distributed application, I 
try to shutdown synchronously a series of nodes, as properly and as in 
parallel as possible, in a non-OTP program. I imagine that using '[ 
rpc:cast( N, erlang, halt, [] ) || N <- MyTargetNodes ]' and then waiting 
for them to be terminated is the best approach for that. 

As I want now these terminations to be synchronous (i.e. I want my 
terminate function to return only when all nodes are down for sure), I 
used to rely on checking their termination using net_adm:ping/1 (waiting 
for pong to become pang), but kept on getting (systematically) 
'noconnection' errors (exceptions?), which do not seem to be catchable (at 
least not with a 'try .. catch T:E ->.. end' clause). This happens as soon 
as there is at least one node (which happens to be on the same host - of 
course it is not the local node from which that rpc:cast is triggered) to 
halt. 

I switched to looping on 'lists:member( Nodename, nodes() )' instead of 
ping (in both case with a proper waiting between checks), but I still get 
'noconnection' errors. It looks like 'noconnection' is VM-level? As 
expected, commenting-out the rpc:cast/3 never leads to 'noconnection'. 

I feel I would need something like net_kernel:unconnect_node/1. 

My question now: how to deal gracefully with such a synchronous node 
shutdown and to resist to the (intended) loss of node(s)? 

Thanks in advance for any hint! 
Best regards, 

Olivier.
---------------------------
Olivier Boudeville

EDF R&D : 1, avenue du Général de Gaulle, 92140 Clamart, France
Département SINETICS, groupe ASICS (I2A), bureau B-226
Office : +33 1 47 65 59 58 / Mobile : +33 6 16 83 37 22 / Fax : +33 1 47 
65 27 13

Ce message et toutes les pièces jointes (ci-après le 'Message') sont 
établis à l'intention exclusive des destinataires et les informations qui 
y figurent sont strictement confidentielles. Toute utilisation de ce 
Message non conforme à sa destination, toute diffusion ou toute 
publication totale ou partielle, est interdite sauf autorisation expresse.
Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de 
le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou 
partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de 
votre système, ainsi que toutes ses copies, et de n'en garder aucune trace 
sur quelque support que ce soit. Nous vous remercions également d'en 
avertir immédiatement l'expéditeur par retour du message.
Il est impossible de garantir que les communications par messagerie 
électronique arrivent en temps utile, sont sécurisées ou dénuées de toute 
erreur ou virus.
____________________________________________________
This message and any attachments (the 'Message') are intended solely for 
the addressees. The information contained in this Message is confidential. 
Any use of information contained in this Message not in accord with its 
purpose, any dissemination or disclosure, either whole or partial, is 
prohibited except formal approval.
If you are not the addressee, you may not copy, forward, disclose or use 
any part of it. If you have received this message in error, please delete 
it and all copies from your system and notify the sender immediately by 
return message.
E-mail communication cannot be guaranteed to be timely secure, error or 
virus-free.

_______________________________________________
erlang-questions mailing list
erlang-questions@REDACTED
http://erlang.org/mailman/listinfo/erlang-questions





Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse.

Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message.

Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus.
____________________________________________________

This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval.

If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message.

E-mail communication cannot be guaranteed to be timely secure, error or virus-free.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120426/5ea5734c/attachment.htm>


More information about the erlang-questions mailing list