[erlang-questions] Resisting "noconnection" / Remote termination of nodes
Olivier BOUDEVILLE
olivier.boudeville@REDACTED
Thu Apr 26 17:30:56 CEST 2012
Hi Kannan,
Indeed there was already a master node which, once the termination was
agreed upon (consensus is easy to obtain in my case), was enforcing it,
shutting down distributed services synchronously and in the right order.
So the tear-down phase was fully planned, except that a 'noconnection'
error suggested that I had at least one reckless process that was
attempting to reach an already halted node.
My question was whether 'noconnection' was VM-level and uncatchable (not
directly triggered by a specific operation that I could have wrapped in a
try/catch clause) and whether we could have a little more information than
just 'noconnection' (source/target node/pid).
Anyway replacing rpc:cast( N, erlang, halt, [] ) by a direct halt()
performed on a process already running on each node did the trick.
Thanks,
Best regards,
Olivier Boudeville.
---------------------------
Olivier Boudeville
EDF R&D : 1, avenue du Général de Gaulle, 92140 Clamart, France
Département SINETICS, groupe ASICS (I2A), bureau B-226
Office : +33 1 47 65 59 58 / Mobile : +33 6 16 83 37 22 / Fax : +33 1 47
65 27 13
vasdeveloper@REDACTED
26/04/2012 12:34
A
olivier.boudeville@REDACTED
cc
erlang-questions@REDACTED
Objet
Re: [erlang-questions] Resisting "noconnection" / Remote termination of
nodes
Hi Olivier,
In distributed computing, we see the collective effort of individual --
but not independent -- nodes. Communicating nodes, at any point in time,
need to be connected and should have the knowledge of the other, including
the availability status. The OAM node, or leader, will have the total
picture of all the nodes.
You have to look at the bigger picture of the whole system, when it comes
to stopping a distributed system. The following steps are generally taken
to tear down a system.
* Signal all the nodes of a forthcoming shutdown request
* Stop accepting new requests
* Finish servicing the accepted requests
* Do the stock taking and clean-up
* Check the readiness of all the nodes for shutdown
* Then call shutdown on all the nodes.
Here, I am looking from top into the system. The shutdown process will be
generally coordinated by a single process.
An ad-hock shutdown is more scary to me in a production environment.
Kind Regards,
Kannan.
On Tue, Apr 24, 2012 at 5:22 PM, Olivier BOUDEVILLE <
olivier.boudeville@REDACTED> wrote:
Hi,
For a more controlled overall termination of a distributed application, I
try to shutdown synchronously a series of nodes, as properly and as in
parallel as possible, in a non-OTP program. I imagine that using '[
rpc:cast( N, erlang, halt, [] ) || N <- MyTargetNodes ]' and then waiting
for them to be terminated is the best approach for that.
As I want now these terminations to be synchronous (i.e. I want my
terminate function to return only when all nodes are down for sure), I
used to rely on checking their termination using net_adm:ping/1 (waiting
for pong to become pang), but kept on getting (systematically)
'noconnection' errors (exceptions?), which do not seem to be catchable (at
least not with a 'try .. catch T:E ->.. end' clause). This happens as soon
as there is at least one node (which happens to be on the same host - of
course it is not the local node from which that rpc:cast is triggered) to
halt.
I switched to looping on 'lists:member( Nodename, nodes() )' instead of
ping (in both case with a proper waiting between checks), but I still get
'noconnection' errors. It looks like 'noconnection' is VM-level? As
expected, commenting-out the rpc:cast/3 never leads to 'noconnection'.
I feel I would need something like net_kernel:unconnect_node/1.
My question now: how to deal gracefully with such a synchronous node
shutdown and to resist to the (intended) loss of node(s)?
Thanks in advance for any hint!
Best regards,
Olivier.
---------------------------
Olivier Boudeville
EDF R&D : 1, avenue du Général de Gaulle, 92140 Clamart, France
Département SINETICS, groupe ASICS (I2A), bureau B-226
Office : +33 1 47 65 59 58 / Mobile : +33 6 16 83 37 22 / Fax : +33 1 47
65 27 13
Ce message et toutes les pièces jointes (ci-après le 'Message') sont
établis à l'intention exclusive des destinataires et les informations qui
y figurent sont strictement confidentielles. Toute utilisation de ce
Message non conforme à sa destination, toute diffusion ou toute
publication totale ou partielle, est interdite sauf autorisation expresse.
Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de
le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou
partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de
votre système, ainsi que toutes ses copies, et de n'en garder aucune trace
sur quelque support que ce soit. Nous vous remercions également d'en
avertir immédiatement l'expéditeur par retour du message.
Il est impossible de garantir que les communications par messagerie
électronique arrivent en temps utile, sont sécurisées ou dénuées de toute
erreur ou virus.
____________________________________________________
This message and any attachments (the 'Message') are intended solely for
the addressees. The information contained in this Message is confidential.
Any use of information contained in this Message not in accord with its
purpose, any dissemination or disclosure, either whole or partial, is
prohibited except formal approval.
If you are not the addressee, you may not copy, forward, disclose or use
any part of it. If you have received this message in error, please delete
it and all copies from your system and notify the sender immediately by
return message.
E-mail communication cannot be guaranteed to be timely secure, error or
virus-free.
_______________________________________________
erlang-questions mailing list
erlang-questions@REDACTED
http://erlang.org/mailman/listinfo/erlang-questions
Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse.
Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message.
Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus.
____________________________________________________
This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval.
If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message.
E-mail communication cannot be guaranteed to be timely secure, error or virus-free.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120426/5ea5734c/attachment.htm>
More information about the erlang-questions
mailing list