[erlang-questions] Re: [erlang-questions 8] Re: code_server:call/2 problem?

Ahmed Omar spawn.think@REDACTED
Tue Mar 29 14:59:52 CEST 2011


Any explanation about the supervisor children order in kernel here? why rpc
first?

On Fri, Mar 25, 2011 at 6:23 PM, Olivier BOUDEVILLE <
olivier.boudeville@REDACTED> wrote:

>
> Hi,
>
> By the way, looking at the article about Dialyzer's race condition
> detection capabilities (
> http://www.it.uu.se/research/group/hipe/dialyzer/publications/races.pdf),
> I was surprised that what appears like a race condition involving
> code_server was not spotted, whereas it seemed to correspond to the process
> registry-based race condition type described in the article (first
> documented example).
>
> Unless it corresponds to one of the 4 "ProcR" conditions established for
> kernel, and it has not been fixed yet?
>
> Best regards,
>
> Olivier Boudeville.
> ---------------------------
> Olivier Boudeville
>
> EDF R&D : 1, avenue du Général de Gaulle, 92140 Clamart, France
> Département SINETICS, groupe ASICS (I2A), bureau B-226
> Office : +33 1 47 65 59 58 / Mobile : +33 6 16 83 37 22 / Fax : +33 1 47
> 65 27 13
>
>
>  *spawn.think@REDACTED*
> Envoyé par : erlang-questions@REDACTED
>
> 21/03/2011 14:46
>   A
> olivier.boudeville@REDACTED
> cc
> mevans@REDACTED, erlang-questions@REDACTED
>  Objet
> Re: [erlang-questions] code_server:call/2 problem?
>
>
>
>
> Hi Olivier,
> I think it has to do with the supervisor children order in kernel app.
> currently the rpc server is started before the code_server, i'm not sure if
> there's a good reason for that
>
> {ok, {SupFlags,
>  [Rpc, Global, InetDb | DistAC] ++
>  [NetSup, Glo_grp, File, Code,
>   StdError, User, Config, SafeSupervisor] ++ Timer}}
>
>
> On Mon, Mar 21, 2011 at 1:22 PM, Olivier BOUDEVILLE <
> olivier.boudeville@REDACTED> wrote:
>
> > Hi,
> >
> > Thanks for your answer. Indeed, this could have been explanation; however
> > the node is crashing after that error, not before nor "in parallel" to
> the
> > error.
> >
> > Actually I believe there is a bug in the Erlang runtime. I strongly
> > suspect there is a small time window during which a race condition can
> > occur: apparently code:load_binary can be triggered (thanks to
> > rpc:multicall) on a just-launched node before at least one of its system
> > processes succeeds in registering its name. At least that's what I came
> to
> > think after having peered at lib/kernel/src/code_server.erl, the badarg
> > that occured may come from the fact that call/2 is called whereas Name is
> > not registered (yet), in:
> >
> > """
> > call(Name, Req) ->
> > Name ! {code_call, self(), Req},
> > receive
> >   {?MODULE, Reply} ->
> >        Reply
> > end.
> > """"
> >
> > As a test, the non-systematic crash which, on our short test case (done
> on
> > Ubuntu 64-bit running on a 4--core Core i7 laptop), was taking on average
> > 30 seconds (loop of ~15 attempts) before happening, never happened with
> > the same loop being run for more than one hour, once I inserted a
> > timer:sleep(1000) in my deployment manager between the launching of the
> > remote VM and the call to rpc:multicall (knowing that intermediate
> > checkings like Erlang ping of the remote node and checking of the remote
> > Erlang version always succeeded).
> >
> > I suppose there is in the runtime a kind of synchronous barrier where all
> > system processes are checked to be up and ready (including appropriately
> > registered) before serving user-space requests, but probably that at
> least
> > one system process was forgotten and thus led to such a race condition.
> > Unless I am mistaken?
> >
> > Thanks in advance for any answer,
> > Best regards,
> >
> > Olivier.
> > ---------------------------
> > Olivier Boudeville
> >
> > EDF R&D : 1, avenue du Général de Gaulle, 92140 Clamart, France
> > Département SINETICS, groupe ASICS (I2A), bureau B-226
> > Office : +33 1 47 65 59 58 / Mobile : +33 6 16 83 37 22 / Fax : +33 1 47
> > 65 27 13
> >
> >
> >
> > mevans@REDACTED
> > Envoyé par : erlang-questions@REDACTED
> > 18/03/2011 19:22
> >
> > A
> > olivier.boudeville@REDACTED, erlang-questions@REDACTED
> > cc
> >
> > Objet
> > RE: [erlang-questions] code_server:call/2 problem?
> >
> >
> >
> >
> >
> >
> > Do you see any crashes on the remote nodes?
> >
> > It does look like the remote code_server application has got a request,
> > but it for some reason fails (corrupted data perhaps?). I'm wondering if
> > you could attach a remote shell to one of those nodes and trace the
> > code_server module?
> >
> > -----Original Message-----
> > From: erlang-questions@REDACTED [mailto:erlang-questions@REDACTED]
> On
> > Behalf Of Olivier BOUDEVILLE
> > Sent: Friday, March 18, 2011 1:32 PM
> > To: erlang-questions@REDACTED
> > Subject: [erlang-questions] code_server:call/2 problem?
> >
> > Hi,
> >
> > We are running a distributed Erlang program on a user node from which a
> > number of computing nodes are spawned, via SSH for the remote hosts. To
> > perform the automatic deployment, two deployment-related modules are sent
> > to each of the spawned nodes, using the traditional approach (first a
> call
> >
> > to code:get_object_code/1 then a rpc:multicall of code:load_binary).
> >
> > However, sometimes (not frequently), with the exact same settings, the
> > first module cannot be deployed successfully. We have indeed:
> >
> > {ResList,BadNodes} = rpc:multicall( Nodes, code, load_binary, [
> > ModuleName, ModuleFilename, ModuleBinary ], Timeout ),
> >
> > that returns:
> > ResList =
> >
> >
> [{badrpc,{'EXIT',{badarg,[{code_server,call,2},{rpc,'-handle_call_call/6-fun-0-',5}]}}}]
> > BadNodes = []
> >
> > This happens with R14B02, but most probably with previous versions as
> > well.
> > Apparently this happens often (always?) on a node created on the user
> > host.
> > I am pretty sure the deployed node is "fresh" (blank, vanilla).
> > And ignoring the badrpc will result in a undef error as soon as the first
> > function of the first helper module is called, even if delaying the call
> > (a race condition was suspected if ever the actual loading was
> > asynchronous).
> >
> > Would anyone see a cause for such a badarg non-systematic error?
> >
> > Thanks in advance for any hint,
> > Best regards,
> >
> > Olivier Boudeville.
> > ---------------------------
> > Olivier Boudeville
> >
> > EDF R&D : 1, avenue du Général de Gaulle, 92140 Clamart, France
> > Département SINETICS, groupe ASICS (I2A), bureau B-226
> > Office : +33 1 47 65 59 58 / Mobile : +33 6 16 83 37 22 / Fax : +33 1 47
> > 65 27 13
> >
> >
> >
> > Ce message et toutes les pièces jointes (ci-après le 'Message') sont
> > établis à l'intention exclusive des destinataires et les informations qui
> > y figurent sont strictement confidentielles. Toute utilisation de ce
> > Message non conforme à sa destination, toute diffusion ou toute
> > publication totale ou partielle, est interdite sauf autorisation
> expresse.
> >
> > Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de
> > le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou
> > partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de
> > votre système, ainsi que toutes ses copies, et de n'en garder aucune
> trace
> > sur quelque support que ce soit. Nous vous remercions également d'en
> > avertir immédiatement l'expéditeur par retour du message.
> >
> > Il est impossible de garantir que les communications par messagerie
> > électronique arrivent en temps utile, sont sécurisées ou dénuées de toute
> > erreur ou virus.
> > ____________________________________________________
> >
> > This message and any attachments (the 'Message') are intended solely for
> > the addressees. The information contained in this Message is
> confidential.
> > Any use of information contained in this Message not in accord with its
> > purpose, any dissemination or disclosure, either whole or partial, is
> > prohibited except formal approval.
> >
> > If you are not the addressee, you may not copy, forward, disclose or use
> > any part of it. If you have received this message in error, please delete
> > it and all copies from your system and notify the sender immediately by
> > return message.
> >
> > E-mail communication cannot be guaranteed to be timely secure, error or
> > virus-free.
> >
> > ________________________________________________________________
> > erlang-questions (at) erlang.org mailing list.
> > See http://www.erlang.org/faq.html
> > To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
> >
> >
> >
> >
> >
> > Ce message et toutes les pièces jointes (ci-après le 'Message') sont
> > établis à l'intention exclusive des destinataires et les informations qui
> y
> > figurent sont strictement confidentielles. Toute utilisation de ce
> Message
> > non conforme à sa destination, toute diffusion ou toute publication
> totale
> > ou partielle, est interdite sauf autorisation expresse.
> >
> > Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de
> > le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou
> > partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de
> > votre système, ainsi que toutes ses copies, et de n'en garder aucune
> trace
> > sur quelque support que ce soit. Nous vous remercions également d'en
> avertir
> > immédiatement l'expéditeur par retour du message.
> >
> > Il est impossible de garantir que les communications par messagerie
> > électronique arrivent en temps utile, sont sécurisées ou dénuées de toute
> > erreur ou virus.
> > ____________________________________________________
> >
> > This message and any attachments (the 'Message') are intended solely for
> > the addressees. The information contained in this Message is
> confidential.
> > Any use of information contained in this Message not in accord with its
> > purpose, any dissemination or disclosure, either whole or partial, is
> > prohibited except formal approval.
> >
> > If you are not the addressee, you may not copy, forward, disclose or use
> > any part of it. If you have received this message in error, please delete
> it
> > and all copies from your system and notify the sender immediately by
> return
> > message.
> >
> > E-mail communication cannot be guaranteed to be timely secure, error or
> > virus-free.
> >
>
>
>
> --
> Best Regards,
> - Ahmed Omar
> http://nl.linkedin.com/in/adiaa
> Follow me on twitter
> @spawn_think <http://twitter.com/#!/spawn_think>
>
>
>
>
> Ce message et toutes les pièces jointes (ci-après le 'Message') sont
> établis à l'intention exclusive des destinataires et les informations qui y
> figurent sont strictement confidentielles. Toute utilisation de ce Message
> non conforme à sa destination, toute diffusion ou toute publication totale
> ou partielle, est interdite sauf autorisation expresse.
>
> Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de
> le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou
> partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de
> votre système, ainsi que toutes ses copies, et de n'en garder aucune trace
> sur quelque support que ce soit. Nous vous remercions également d'en avertir
> immédiatement l'expéditeur par retour du message.
>
> Il est impossible de garantir que les communications par messagerie
> électronique arrivent en temps utile, sont sécurisées ou dénuées de toute
> erreur ou virus.
> ____________________________________________________
>
> This message and any attachments (the 'Message') are intended solely for
> the addressees. The information contained in this Message is confidential.
> Any use of information contained in this Message not in accord with its
> purpose, any dissemination or disclosure, either whole or partial, is
> prohibited except formal approval.
>
> If you are not the addressee, you may not copy, forward, disclose or use
> any part of it. If you have received this message in error, please delete it
> and all copies from your system and notify the sender immediately by return
> message.
>
> E-mail communication cannot be guaranteed to be timely secure, error or
> virus-free.
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>


-- 
Best Regards,
- Ahmed Omar
http://nl.linkedin.com/in/adiaa
Follow me on twitter
@spawn_think <http://twitter.com/#!/spawn_think>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20110329/3d6fc1db/attachment.htm>


More information about the erlang-questions mailing list