<br><font size=2 face="sans-serif">Hi all,</font>
<br>
<br><font size=2 face="sans-serif">I had a few questions about node disconnections.
Currently I have a distributed application that must resist the crash of
at least some of its hosts. I test the whole feature first by using ten
remote virtual machines which I software-disconnect from the user host
at random moments of the application execution, thanks to a merciless 'ifconfig
down' of the relevant network interface. </font>
<br>
<br><font size=2 face="sans-serif">It works great, insofar as the communications
then freeze immediately (no surprise). My intent was to monitor, from a
specific process on the user node, each of these 10 worker nodes (using
net_kernel:monitor_nodes/2), to receive the corresponding 'nodedown' messages
and (if monitoring does not prevent 'noconnection' to be triggered) then
to issue a disconnect_node/1 for each of them, so that my user node can
resist these losses. </font>
<br>
<br><font size=2 face="sans-serif">However, most of the time I cannot intercept
the 'nodedown' information soon enough (or at all), and the whole program
crashes and burns, with a message:</font>
<br>
<br><font size=2 face="sans-serif"> **
Node 'N' not responding ** </font>
<br><font size=2 face="sans-serif"> **
Removing (timedout) connection ** </font>
<br><font size=2 face="sans-serif"> {"init
terminating in do_boot",noconnection}</font>
<br>
<br><font size=2 face="sans-serif">So, my question: how can I prevent this
noconnection to wreak havoc, as it seems to ruin our ability to resist
node losses?</font>
<br>
<br><font size=2 face="sans-serif">If I understand well, as these reliability
messages shall be managed "out of band", there is always a race
condition between their receiving and the telling to all processes to stop
interacting with the lost node(s). So if there were no way of at least
temporarily resisting 'noconnection' (as whatever we do there *will* be
processes that will attempt to send a message to a lost node), the whole
purpose of the approach would be defeated. Unless I misunderstood something?</font>
<br>
<br><font size=2 face="sans-serif">A related question is that, apparently,
increasing the kernel net tick time (say, from 60 to 300) does not seem
to increase accordingly the noconnection time-out that must exist somewhere.
As a result, I think that by design the node monitoring can only fail then
(a noconnection will always happen before the monitoring messages have
a chance to kick in).</font>
<br>
<br><font size=2 face="sans-serif">Thanks in advance for any hint!</font>
<br>
<br><font size=2 face="sans-serif">Best regards,</font>
<br><font size=2 face="sans-serif"><br>
Olivier.<br>
---------------------------<br>
Olivier Boudeville<br>
<br>
EDF R&D : 1, avenue du Général de Gaulle, 92140 Clamart, France<br>
Département SINETICS, groupe ASICS (I2A), bureau B-226<br>
Office : +33 1 47 65 59 58 / Mobile : +33 6 16 83 37 22 / Fax : +33 1 47
65 27 13</font><p></p>
<p><br>
Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse.</p>
<p>Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message.</p>
<p>Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus.<br>
____________________________________________________</p>
<p>This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval.</p>
<p>If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message.</p>
<p>E-mail communication cannot be guaranteed to be timely secure, error or virus-free.</p>