[erlang-questions] code_server:call/2 problem?

Olivier BOUDEVILLE <>
Tue Mar 22 10:03:03 CET 2011


Hi,

Thanks for that information. Indeed, shouldn't the kernel ensure that the 
code_server is not only started but also already up and running before 
putting the rpc server online? Is there any Erlang maintainer willing to 
comment on that?

Otherwise I guess the user code would have to poll the registering service 
for the appropriate name (code_server or alike) to be declared before 
triggering the code-related operations, which would be a bit 
awkward/unfortunate!

I suppose that rpc depends on code_server (and on many other services), 
but not the other way round? Maybe it would be worth to establish a 
dependency graph of services to ensure that their starting up is properly 
synchronised? (might be overkill)

Thanks in advance for any hint,
Best regards,

Olivier.
---------------------------
Olivier Boudeville

EDF R&D : 1, avenue du Général de Gaulle, 92140 Clamart, France
Département SINETICS, groupe ASICS (I2A), bureau B-226
Office : +33 1 47 65 59 58 / Mobile : +33 6 16 83 37 22 / Fax : +33 1 47 
65 27 13



 
Envoyé par : 
21/03/2011 14:46

A

cc
, 
Objet
Re: [erlang-questions] code_server:call/2 problem?






Hi Olivier,
I think it has to do with the supervisor children order in kernel app.
currently the rpc server is started before the code_server, i'm not sure 
if
there's a good reason for that

{ok, {SupFlags,
  [Rpc, Global, InetDb | DistAC] ++
  [NetSup, Glo_grp, File, Code,
   StdError, User, Config, SafeSupervisor] ++ Timer}}


On Mon, Mar 21, 2011 at 1:22 PM, Olivier BOUDEVILLE <
> wrote:

> Hi,
>
> Thanks for your answer. Indeed, this could have been explanation; 
however
> the node is crashing after that error, not before nor "in parallel" to 
the
> error.
>
> Actually I believe there is a bug in the Erlang runtime. I strongly
> suspect there is a small time window during which a race condition can
> occur: apparently code:load_binary can be triggered (thanks to
> rpc:multicall) on a just-launched node before at least one of its system
> processes succeeds in registering its name. At least that's what I came 
to
> think after having peered at lib/kernel/src/code_server.erl, the badarg
> that occured may come from the fact that call/2 is called whereas Name 
is
> not registered (yet), in:
>
> """
> call(Name, Req) ->
> Name ! {code_call, self(), Req},
> receive
>   {?MODULE, Reply} ->
>        Reply
> end.
> """"
>
> As a test, the non-systematic crash which, on our short test case (done 
on
> Ubuntu 64-bit running on a 4--core Core i7 laptop), was taking on 
average
> 30 seconds (loop of ~15 attempts) before happening, never happened with
> the same loop being run for more than one hour, once I inserted a
> timer:sleep(1000) in my deployment manager between the launching of the
> remote VM and the call to rpc:multicall (knowing that intermediate
> checkings like Erlang ping of the remote node and checking of the remote
> Erlang version always succeeded).
>
> I suppose there is in the runtime a kind of synchronous barrier where 
all
> system processes are checked to be up and ready (including appropriately
> registered) before serving user-space requests, but probably that at 
least
> one system process was forgotten and thus led to such a race condition.
> Unless I am mistaken?
>
> Thanks in advance for any answer,
> Best regards,
>
> Olivier.
> ---------------------------
> Olivier Boudeville
>
> EDF R&D : 1, avenue du Général de Gaulle, 92140 Clamart, France
> Département SINETICS, groupe ASICS (I2A), bureau B-226
> Office : +33 1 47 65 59 58 / Mobile : +33 6 16 83 37 22 / Fax : +33 1 47
> 65 27 13
>
>
>
> 
> Envoyé par : 
> 18/03/2011 19:22
>
> A
> , 
> cc
>
> Objet
> RE: [erlang-questions] code_server:call/2 problem?
>
>
>
>
>
>
> Do you see any crashes on the remote nodes?
>
> It does look like the remote code_server application has got a request,
> but it for some reason fails (corrupted data perhaps?). I'm wondering if
> you could attach a remote shell to one of those nodes and trace the
> code_server module?
>
> -----Original Message-----
> From:  [mailto:] 
On
> Behalf Of Olivier BOUDEVILLE
> Sent: Friday, March 18, 2011 1:32 PM
> To: 
> Subject: [erlang-questions] code_server:call/2 problem?
>
> Hi,
>
> We are running a distributed Erlang program on a user node from which a
> number of computing nodes are spawned, via SSH for the remote hosts. To
> perform the automatic deployment, two deployment-related modules are 
sent
> to each of the spawned nodes, using the traditional approach (first a 
call
>
> to code:get_object_code/1 then a rpc:multicall of code:load_binary).
>
> However, sometimes (not frequently), with the exact same settings, the
> first module cannot be deployed successfully. We have indeed:
>
> {ResList,BadNodes} = rpc:multicall( Nodes, code, load_binary, [
> ModuleName, ModuleFilename, ModuleBinary ], Timeout ),
>
> that returns:
> ResList =
>
> 
[{badrpc,{'EXIT',{badarg,[{code_server,call,2},{rpc,'-handle_call_call/6-fun-0-',5}]}}}]
> BadNodes = []
>
> This happens with R14B02, but most probably with previous versions as
> well.
> Apparently this happens often (always?) on a node created on the user
> host.
> I am pretty sure the deployed node is "fresh" (blank, vanilla).
> And ignoring the badrpc will result in a undef error as soon as the 
first
> function of the first helper module is called, even if delaying the call
> (a race condition was suspected if ever the actual loading was
> asynchronous).
>
> Would anyone see a cause for such a badarg non-systematic error?
>
> Thanks in advance for any hint,
> Best regards,
>
> Olivier Boudeville.
> ---------------------------
> Olivier Boudeville
>
> EDF R&D : 1, avenue du Général de Gaulle, 92140 Clamart, France
> Département SINETICS, groupe ASICS (I2A), bureau B-226
> Office : +33 1 47 65 59 58 / Mobile : +33 6 16 83 37 22 / Fax : +33 1 47
> 65 27 13
>
>
>
> Ce message et toutes les pièces jointes (ci-après le 'Message') sont
> établis à l'intention exclusive des destinataires et les informations 
qui
> y figurent sont strictement confidentielles. Toute utilisation de ce
> Message non conforme à sa destination, toute diffusion ou toute
> publication totale ou partielle, est interdite sauf autorisation 
expresse.
>
> Si vous n'êtes pas le destinataire de ce Message, il vous est interdit 
de
> le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou
> partie. Si vous avez reçu ce Message par erreur, merci de le supprimer 
de
> votre système, ainsi que toutes ses copies, et de n'en garder aucune 
trace
> sur quelque support que ce soit. Nous vous remercions également d'en
> avertir immédiatement l'expéditeur par retour du message.
>
> Il est impossible de garantir que les communications par messagerie
> électronique arrivent en temps utile, sont sécurisées ou dénuées de 
toute
> erreur ou virus.
> ____________________________________________________
>
> This message and any attachments (the 'Message') are intended solely for
> the addressees. The information contained in this Message is 
confidential.
> Any use of information contained in this Message not in accord with its
> purpose, any dissemination or disclosure, either whole or partial, is
> prohibited except formal approval.
>
> If you are not the addressee, you may not copy, forward, disclose or use
> any part of it. If you have received this message in error, please 
delete
> it and all copies from your system and notify the sender immediately by
> return message.
>
> E-mail communication cannot be guaranteed to be timely secure, error or
> virus-free.
>
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:
>
>
>
>
>
> Ce message et toutes les pièces jointes (ci-après le 'Message') sont
> établis à l'intention exclusive des destinataires et les informations 
qui y
> figurent sont strictement confidentielles. Toute utilisation de ce 
Message
> non conforme à sa destination, toute diffusion ou toute publication 
totale
> ou partielle, est interdite sauf autorisation expresse.
>
> Si vous n'êtes pas le destinataire de ce Message, il vous est interdit 
de
> le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou
> partie. Si vous avez reçu ce Message par erreur, merci de le supprimer 
de
> votre système, ainsi que toutes ses copies, et de n'en garder aucune 
trace
> sur quelque support que ce soit. Nous vous remercions également d'en 
avertir
> immédiatement l'expéditeur par retour du message.
>
> Il est impossible de garantir que les communications par messagerie
> électronique arrivent en temps utile, sont sécurisées ou dénuées de 
toute
> erreur ou virus.
> ____________________________________________________
>
> This message and any attachments (the 'Message') are intended solely for
> the addressees. The information contained in this Message is 
confidential.
> Any use of information contained in this Message not in accord with its
> purpose, any dissemination or disclosure, either whole or partial, is
> prohibited except formal approval.
>
> If you are not the addressee, you may not copy, forward, disclose or use
> any part of it. If you have received this message in error, please 
delete it
> and all copies from your system and notify the sender immediately by 
return
> message.
>
> E-mail communication cannot be guaranteed to be timely secure, error or
> virus-free.
>



-- 
Best Regards,
- Ahmed Omar
http://nl.linkedin.com/in/adiaa
Follow me on twitter
@spawn_think <http://twitter.com/#!/spawn_think>




Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse.

Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message.

Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus.
____________________________________________________

This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval.

If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message.

E-mail communication cannot be guaranteed to be timely secure, error or virus-free.


More information about the erlang-questions mailing list