Any explanation about the supervisor children order in kernel here? why rpc first?<br><br><div class="gmail_quote">On Fri, Mar 25, 2011 at 6:23 PM, Olivier BOUDEVILLE <span dir="ltr"><<a href="mailto:olivier.boudeville@edf.fr">olivier.boudeville@edf.fr</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br><font size="2" face="sans-serif">Hi,</font>
<br>
<br><font size="2" face="sans-serif">By the way, looking at the article about
Dialyzer's race condition detection capabilities (<a href="http://www.it.uu.se/research/group/hipe/dialyzer/publications/races.pdf" target="_blank">http://www.it.uu.se/research/group/hipe/dialyzer/publications/races.pdf</a>),
I was surprised that what appears like a race condition involving code_server
was not spotted, whereas it seemed to correspond to the process registry-based
race condition type described in the article (first documented example).</font>
<br>
<br><font size="2" face="sans-serif">Unless it corresponds to one of the
4 "ProcR" conditions established for kernel, and it has not been
fixed yet?<br>
</font><div class="im">
<br><font size="2" face="sans-serif">Best regards,</font>
<br><font size="2" face="sans-serif"><br>
Olivier Boudeville.<br>
---------------------------<br>
Olivier Boudeville<br>
<br>
EDF R&D : 1, avenue du Général de Gaulle, 92140 Clamart, France<br>
Département SINETICS, groupe ASICS (I2A), bureau B-226<br>
Office : <a href="tel:%2B33%201%2047%2065%2059%2058" target="_blank">+33 1 47 65 59 58</a> / Mobile : <a href="tel:%2B33%206%2016%2083%2037%2022" target="_blank">+33 6 16 83 37 22</a> / Fax : +33 1 47
65 27 13</font>
<br>
<br>
<br>
</div><p></p><table width="100%">
<tbody><tr valign="top">
<td width="40%"><font size="1" face="sans-serif"><b><a href="mailto:spawn.think@gmail.com" target="_blank">spawn.think@gmail.com</a></b>
</font>
<br><div class="im"><font size="1" face="sans-serif">Envoyé par : <a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a></font>
</div><p><font size="1" face="sans-serif"><a href="tel:21%2F03%2F2011%2014" target="_blank">21/03/2011 14</a>:46</font>
</p></td><td width="59%">
<table width="100%">
<tbody><tr valign="top">
<td>
<div align="right"><font size="1" face="sans-serif">A</font></div>
</td><td><font size="1" face="sans-serif"><a href="mailto:olivier.boudeville@edf.fr" target="_blank">olivier.boudeville@edf.fr</a></font>
</td></tr><tr valign="top">
<td>
<div align="right"><font size="1" face="sans-serif">cc</font></div>
</td><td><div class="im"><font size="1" face="sans-serif"><a href="mailto:mevans@verivue.com" target="_blank">mevans@verivue.com</a>, <a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a></font>
</div></td></tr><tr valign="top">
<td>
<div align="right"><font size="1" face="sans-serif">Objet</font></div>
</td><td><font size="1" face="sans-serif">Re: [erlang-questions] code_server:call/2
problem?</font></td></tr></tbody></table>
<br>
<table>
<tbody><tr valign="top">
<td>
</td><td></td></tr></tbody></table>
<br></td></tr></tbody></table>
<br>
<br>
<br><tt><font size="2"><div><div></div><div class="h5">Hi Olivier,<br>
I think it has to do with the supervisor children order in kernel app.<br>
currently the rpc server is started before the code_server, i'm not sure
if<br>
there's a good reason for that<br>
<br>
{ok, {SupFlags,<br>
[Rpc, Global, InetDb | DistAC] ++<br>
[NetSup, Glo_grp, File, Code,<br>
StdError, User, Config, SafeSupervisor] ++ Timer}}<br>
<br>
<br>
On Mon, Mar 21, 2011 at 1:22 PM, Olivier BOUDEVILLE <<br>
<a href="mailto:olivier.boudeville@edf.fr" target="_blank">olivier.boudeville@edf.fr</a>> wrote:<br>
<br>
> Hi,<br>
><br>
> Thanks for your answer. Indeed, this could have been explanation;
however<br>
> the node is crashing after that error, not before nor "in parallel"
to the<br>
> error.<br>
><br>
> Actually I believe there is a bug in the Erlang runtime. I strongly<br>
> suspect there is a small time window during which a race condition
can<br>
> occur: apparently code:load_binary can be triggered (thanks to<br>
> rpc:multicall) on a just-launched node before at least one of its
system<br>
> processes succeeds in registering its name. At least that's what I
came to<br>
> think after having peered at lib/kernel/src/code_server.erl, the badarg<br>
> that occured may come from the fact that call/2 is called whereas
Name is<br>
> not registered (yet), in:<br>
><br>
> """<br>
> call(Name, Req) -><br>
> Name ! {code_call, self(), Req},<br>
> receive<br>
> {?MODULE, Reply} -><br>
> Reply<br>
> end.<br>
> """"<br>
><br>
> As a test, the non-systematic crash which, on our short test case
(done on<br>
> Ubuntu 64-bit running on a 4--core Core i7 laptop), was taking on
average<br>
> 30 seconds (loop of ~15 attempts) before happening, never happened
with<br>
> the same loop being run for more than one hour, once I inserted a<br>
> timer:sleep(1000) in my deployment manager between the launching of
the<br>
> remote VM and the call to rpc:multicall (knowing that intermediate<br>
> checkings like Erlang ping of the remote node and checking of the
remote<br>
> Erlang version always succeeded).<br>
><br>
> I suppose there is in the runtime a kind of synchronous barrier where
all<br>
> system processes are checked to be up and ready (including appropriately<br>
> registered) before serving user-space requests, but probably that
at least<br>
> one system process was forgotten and thus led to such a race condition.<br>
> Unless I am mistaken?<br>
><br>
> Thanks in advance for any answer,<br>
> Best regards,<br>
><br>
> Olivier.<br>
> ---------------------------<br>
> Olivier Boudeville<br>
><br>
> EDF R&D : 1, avenue du Général de Gaulle, 92140 Clamart, France<br>
> Département SINETICS, groupe ASICS (I2A), bureau B-226<br>
> Office : <a href="tel:%2B33%201%2047%2065%2059%2058" target="_blank">+33 1 47 65 59 58</a> / Mobile : <a href="tel:%2B33%206%2016%2083%2037%2022" target="_blank">+33 6 16 83 37 22</a> / Fax : +33
1 47<br>
> 65 27 13<br>
><br>
><br>
><br>
> <a href="mailto:mevans@verivue.com" target="_blank">mevans@verivue.com</a><br>
> Envoyé par : <a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>
> 18/03/2011 19:22<br>
><br>
> A<br>
> <a href="mailto:olivier.boudeville@edf.fr" target="_blank">olivier.boudeville@edf.fr</a>, <a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>
> cc<br>
><br>
> Objet<br>
> RE: [erlang-questions] code_server:call/2 problem?<br>
><br>
><br>
><br>
><br>
><br>
><br>
> Do you see any crashes on the remote nodes?<br>
><br>
> It does look like the remote code_server application has got a request,<br>
> but it for some reason fails (corrupted data perhaps?). I'm wondering
if<br>
> you could attach a remote shell to one of those nodes and trace the<br>
> code_server module?<br>
><br>
> -----Original Message-----<br>
> From: <a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a> [mailto:<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a>]
On<br>
> Behalf Of Olivier BOUDEVILLE<br>
> Sent: Friday, March 18, 2011 1:32 PM<br>
> To: <a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>
> Subject: [erlang-questions] code_server:call/2 problem?<br>
><br>
> Hi,<br>
><br>
> We are running a distributed Erlang program on a user node from which
a<br>
> number of computing nodes are spawned, via SSH for the remote hosts.
To<br>
> perform the automatic deployment, two deployment-related modules are
sent<br>
> to each of the spawned nodes, using the traditional approach (first
a call<br>
><br>
> to code:get_object_code/1 then a rpc:multicall of code:load_binary).<br>
><br>
> However, sometimes (not frequently), with the exact same settings,
the<br>
> first module cannot be deployed successfully. We have indeed:<br>
><br>
> {ResList,BadNodes} = rpc:multicall( Nodes, code, load_binary, [<br>
> ModuleName, ModuleFilename, ModuleBinary ], Timeout ),<br>
><br>
> that returns:<br>
> ResList =<br>
><br>
> [{badrpc,{'EXIT',{badarg,[{code_server,call,2},{rpc,'-handle_call_call/6-fun-0-',5}]}}}]<br>
> BadNodes = []<br>
><br>
> This happens with R14B02, but most probably with previous versions
as<br>
> well.<br>
> Apparently this happens often (always?) on a node created on the user<br>
> host.<br>
> I am pretty sure the deployed node is "fresh" (blank, vanilla).<br>
> And ignoring the badrpc will result in a undef error as soon as the
first<br>
> function of the first helper module is called, even if delaying the
call<br>
> (a race condition was suspected if ever the actual loading was<br>
> asynchronous).<br>
><br>
> Would anyone see a cause for such a badarg non-systematic error?<br>
><br>
> Thanks in advance for any hint,<br>
> Best regards,<br>
><br>
> Olivier Boudeville.<br>
> ---------------------------<br>
> Olivier Boudeville<br>
><br>
> EDF R&D : 1, avenue du Général de Gaulle, 92140 Clamart, France<br>
> Département SINETICS, groupe ASICS (I2A), bureau B-226<br>
> Office : <a href="tel:%2B33%201%2047%2065%2059%2058" target="_blank">+33 1 47 65 59 58</a> / Mobile : <a href="tel:%2B33%206%2016%2083%2037%2022" target="_blank">+33 6 16 83 37 22</a> / Fax : +33
1 47<br>
> 65 27 13<br>
><br>
><br>
><br>
> Ce message et toutes les pièces jointes (ci-après le 'Message') sont<br>
> établis à l'intention exclusive des destinataires et les informations
qui<br>
> y figurent sont strictement confidentielles. Toute utilisation de
ce<br>
> Message non conforme à sa destination, toute diffusion ou toute<br>
> publication totale ou partielle, est interdite sauf autorisation expresse.<br>
><br>
> Si vous n'êtes pas le destinataire de ce Message, il vous est interdit
de<br>
> le copier, de le faire suivre, de le divulguer ou d'en utiliser tout
ou<br>
> partie. Si vous avez reçu ce Message par erreur, merci de le supprimer
de<br>
> votre système, ainsi que toutes ses copies, et de n'en garder aucune
trace<br>
> sur quelque support que ce soit. Nous vous remercions également d'en<br>
> avertir immédiatement l'expéditeur par retour du message.<br>
><br>
> Il est impossible de garantir que les communications par messagerie<br>
> électronique arrivent en temps utile, sont sécurisées ou dénuées de
toute<br>
> erreur ou virus.<br>
> ____________________________________________________<br>
><br>
> This message and any attachments (the 'Message') are intended solely
for<br>
> the addressees. The information contained in this Message is confidential.<br>
> Any use of information contained in this Message not in accord with
its<br>
> purpose, any dissemination or disclosure, either whole or partial,
is<br>
> prohibited except formal approval.<br>
><br>
> If you are not the addressee, you may not copy, forward, disclose
or use<br>
> any part of it. If you have received this message in error, please
delete<br>
> it and all copies from your system and notify the sender immediately
by<br>
> return message.<br>
><br>
> E-mail communication cannot be guaranteed to be timely secure, error
or<br>
> virus-free.<br>
><br>
> ________________________________________________________________<br>
> erlang-questions (at) <a href="http://erlang.org" target="_blank">erlang.org</a> mailing list.<br>
> See <a href="http://www.erlang.org/faq.html" target="_blank">http://www.erlang.org/faq.html</a><br>
> To unsubscribe; mailto:<a href="mailto:erlang-questions-unsubscribe@erlang.org" target="_blank">erlang-questions-unsubscribe@erlang.org</a><br>
><br>
><br>
><br>
><br>
><br>
> Ce message et toutes les pièces jointes (ci-après le 'Message') sont<br>
> établis à l'intention exclusive des destinataires et les informations
qui y<br>
> figurent sont strictement confidentielles. Toute utilisation de ce
Message<br>
> non conforme à sa destination, toute diffusion ou toute publication
totale<br>
> ou partielle, est interdite sauf autorisation expresse.<br>
><br>
> Si vous n'êtes pas le destinataire de ce Message, il vous est interdit
de<br>
> le copier, de le faire suivre, de le divulguer ou d'en utiliser tout
ou<br>
> partie. Si vous avez reçu ce Message par erreur, merci de le supprimer
de<br>
> votre système, ainsi que toutes ses copies, et de n'en garder aucune
trace<br>
> sur quelque support que ce soit. Nous vous remercions également d'en
avertir<br>
> immédiatement l'expéditeur par retour du message.<br>
><br>
> Il est impossible de garantir que les communications par messagerie<br>
> électronique arrivent en temps utile, sont sécurisées ou dénuées de
toute<br>
> erreur ou virus.<br>
> ____________________________________________________<br>
><br>
> This message and any attachments (the 'Message') are intended solely
for<br>
> the addressees. The information contained in this Message is confidential.<br>
> Any use of information contained in this Message not in accord with
its<br>
> purpose, any dissemination or disclosure, either whole or partial,
is<br>
> prohibited except formal approval.<br>
><br>
> If you are not the addressee, you may not copy, forward, disclose
or use<br>
> any part of it. If you have received this message in error, please
delete it<br>
> and all copies from your system and notify the sender immediately
by return<br>
> message.<br>
><br>
> E-mail communication cannot be guaranteed to be timely secure, error
or<br>
> virus-free.<br>
><br>
<br>
<br>
<br>
-- <br>
Best Regards,<br>
- Ahmed Omar<br>
<a href="http://nl.linkedin.com/in/adiaa" target="_blank">http://nl.linkedin.com/in/adiaa</a><br>
Follow me on twitter<br></div></div><div class="im">
@spawn_think <<a href="http://twitter.com/#!/spawn_think" target="_blank">http://twitter.com/#!/spawn_think</a>><br>
</div></font></tt>
<br><font face="monospace"><br>
<br><div class="im">
<br>
Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse.<br>
<br>
Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message.<br>
<br>
Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus.<br>
____________________________________________________<br>
<br>
This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval.<br>
<br>
If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message.<br>
<br>
E-mail communication cannot be guaranteed to be timely secure, error or virus-free.</div></font><br>_______________________________________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>
<br></blockquote></div><br><br clear="all"><br>-- <br>Best Regards,<br>- Ahmed Omar<div><a href="http://nl.linkedin.com/in/adiaa" target="_blank">http://nl.linkedin.com/in/adiaa</a></div><div>Follow me on twitter</div><div>
<a href="http://twitter.com/#!/spawn_think" target="_blank">@spawn_think</a></div><br>