[erlang-questions] code_server:call/2 problem?

Ahmed Omar spawn.think@REDACTED
Mon Mar 21 14:43:54 CET 2011


Hi Olivier,
I think it has to do with the supervisor children order in kernel app.
currently the rpc server is started before the code_server, i'm not sure if
there's a good reason for that

{ok, {SupFlags,
  [Rpc, Global, InetDb | DistAC] ++
  [NetSup, Glo_grp, File, Code,
   StdError, User, Config, SafeSupervisor] ++ Timer}}


On Mon, Mar 21, 2011 at 1:22 PM, Olivier BOUDEVILLE <
olivier.boudeville@REDACTED> wrote:

> Hi,
>
> Thanks for your answer. Indeed, this could have been explanation; however
> the node is crashing after that error, not before nor "in parallel" to the
> error.
>
> Actually I believe there is a bug in the Erlang runtime. I strongly
> suspect there is a small time window during which a race condition can
> occur: apparently code:load_binary can be triggered (thanks to
> rpc:multicall) on a just-launched node before at least one of its system
> processes succeeds in registering its name. At least that's what I came to
> think after having peered at lib/kernel/src/code_server.erl, the badarg
> that occured may come from the fact that call/2 is called whereas Name is
> not registered (yet), in:
>
> """
> call(Name, Req) ->
> Name ! {code_call, self(), Req},
> receive
>   {?MODULE, Reply} ->
>        Reply
> end.
> """"
>
> As a test, the non-systematic crash which, on our short test case (done on
> Ubuntu 64-bit running on a 4--core Core i7 laptop), was taking on average
> 30 seconds (loop of ~15 attempts) before happening, never happened with
> the same loop being run for more than one hour, once I inserted a
> timer:sleep(1000) in my deployment manager between the launching of the
> remote VM and the call to rpc:multicall (knowing that intermediate
> checkings like Erlang ping of the remote node and checking of the remote
> Erlang version always succeeded).
>
> I suppose there is in the runtime a kind of synchronous barrier where all
> system processes are checked to be up and ready (including appropriately
> registered) before serving user-space requests, but probably that at least
> one system process was forgotten and thus led to such a race condition.
> Unless I am mistaken?
>
> Thanks in advance for any answer,
> Best regards,
>
> Olivier.
> ---------------------------
> Olivier Boudeville
>
> EDF R&D : 1, avenue du Général de Gaulle, 92140 Clamart, France
> Département SINETICS, groupe ASICS (I2A), bureau B-226
> Office : +33 1 47 65 59 58 / Mobile : +33 6 16 83 37 22 / Fax : +33 1 47
> 65 27 13
>
>
>
> mevans@REDACTED
> Envoyé par : erlang-questions@REDACTED
> 18/03/2011 19:22
>
> A
> olivier.boudeville@REDACTED, erlang-questions@REDACTED
> cc
>
> Objet
> RE: [erlang-questions] code_server:call/2 problem?
>
>
>
>
>
>
> Do you see any crashes on the remote nodes?
>
> It does look like the remote code_server application has got a request,
> but it for some reason fails (corrupted data perhaps?). I'm wondering if
> you could attach a remote shell to one of those nodes and trace the
> code_server module?
>
> -----Original Message-----
> From: erlang-questions@REDACTED [mailto:erlang-questions@REDACTED] On
> Behalf Of Olivier BOUDEVILLE
> Sent: Friday, March 18, 2011 1:32 PM
> To: erlang-questions@REDACTED
> Subject: [erlang-questions] code_server:call/2 problem?
>
> Hi,
>
> We are running a distributed Erlang program on a user node from which a
> number of computing nodes are spawned, via SSH for the remote hosts. To
> perform the automatic deployment, two deployment-related modules are sent
> to each of the spawned nodes, using the traditional approach (first a call
>
> to code:get_object_code/1 then a rpc:multicall of code:load_binary).
>
> However, sometimes (not frequently), with the exact same settings, the
> first module cannot be deployed successfully. We have indeed:
>
> {ResList,BadNodes} = rpc:multicall( Nodes, code, load_binary, [
> ModuleName, ModuleFilename, ModuleBinary ], Timeout ),
>
> that returns:
> ResList =
>
> [{badrpc,{'EXIT',{badarg,[{code_server,call,2},{rpc,'-handle_call_call/6-fun-0-',5}]}}}]
> BadNodes = []
>
> This happens with R14B02, but most probably with previous versions as
> well.
> Apparently this happens often (always?) on a node created on the user
> host.
> I am pretty sure the deployed node is "fresh" (blank, vanilla).
> And ignoring the badrpc will result in a undef error as soon as the first
> function of the first helper module is called, even if delaying the call
> (a race condition was suspected if ever the actual loading was
> asynchronous).
>
> Would anyone see a cause for such a badarg non-systematic error?
>
> Thanks in advance for any hint,
> Best regards,
>
> Olivier Boudeville.
> ---------------------------
> Olivier Boudeville
>
> EDF R&D : 1, avenue du Général de Gaulle, 92140 Clamart, France
> Département SINETICS, groupe ASICS (I2A), bureau B-226
> Office : +33 1 47 65 59 58 / Mobile : +33 6 16 83 37 22 / Fax : +33 1 47
> 65 27 13
>
>
>
> Ce message et toutes les pièces jointes (ci-après le 'Message') sont
> établis à l'intention exclusive des destinataires et les informations qui
> y figurent sont strictement confidentielles. Toute utilisation de ce
> Message non conforme à sa destination, toute diffusion ou toute
> publication totale ou partielle, est interdite sauf autorisation expresse.
>
> Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de
> le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou
> partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de
> votre système, ainsi que toutes ses copies, et de n'en garder aucune trace
> sur quelque support que ce soit. Nous vous remercions également d'en
> avertir immédiatement l'expéditeur par retour du message.
>
> Il est impossible de garantir que les communications par messagerie
> électronique arrivent en temps utile, sont sécurisées ou dénuées de toute
> erreur ou virus.
> ____________________________________________________
>
> This message and any attachments (the 'Message') are intended solely for
> the addressees. The information contained in this Message is confidential.
> Any use of information contained in this Message not in accord with its
> purpose, any dissemination or disclosure, either whole or partial, is
> prohibited except formal approval.
>
> If you are not the addressee, you may not copy, forward, disclose or use
> any part of it. If you have received this message in error, please delete
> it and all copies from your system and notify the sender immediately by
> return message.
>
> E-mail communication cannot be guaranteed to be timely secure, error or
> virus-free.
>
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>
>
>
>
>
> Ce message et toutes les pièces jointes (ci-après le 'Message') sont
> établis à l'intention exclusive des destinataires et les informations qui y
> figurent sont strictement confidentielles. Toute utilisation de ce Message
> non conforme à sa destination, toute diffusion ou toute publication totale
> ou partielle, est interdite sauf autorisation expresse.
>
> Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de
> le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou
> partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de
> votre système, ainsi que toutes ses copies, et de n'en garder aucune trace
> sur quelque support que ce soit. Nous vous remercions également d'en avertir
> immédiatement l'expéditeur par retour du message.
>
> Il est impossible de garantir que les communications par messagerie
> électronique arrivent en temps utile, sont sécurisées ou dénuées de toute
> erreur ou virus.
> ____________________________________________________
>
> This message and any attachments (the 'Message') are intended solely for
> the addressees. The information contained in this Message is confidential.
> Any use of information contained in this Message not in accord with its
> purpose, any dissemination or disclosure, either whole or partial, is
> prohibited except formal approval.
>
> If you are not the addressee, you may not copy, forward, disclose or use
> any part of it. If you have received this message in error, please delete it
> and all copies from your system and notify the sender immediately by return
> message.
>
> E-mail communication cannot be guaranteed to be timely secure, error or
> virus-free.
>



-- 
Best Regards,
- Ahmed Omar
http://nl.linkedin.com/in/adiaa
Follow me on twitter
@spawn_think <http://twitter.com/#!/spawn_think>


More information about the erlang-questions mailing list