erlang crashes on slave:start in gentoo

"Gösta Ask (Mobile Arts)" gosta.ask@REDACTED
Thu Dec 30 09:32:02 CET 2004


[this is a resend, using my .se adress. Maybe the previous post, sent
yesterday using the .com adress, was taken out by some filter...]

Hi,

I reported this behavior before, see
http://www.erlang.org/ml-archive/erlang-questions/200404/msg00184.html
but I got no answers at that time. I realize you may see this as a
question about a specific OS/machine configuration. Thus of no general
interest to the Erlang community. There was one thing I did not
notice at first though, when I asked erlang-questions. A crash dump
is generated. So even if it is OS-specific I think it is odd that
Erlang crashes.

I get the same behavior on my gentoo machine at home with a 2.6 kernel
(standard out-of-the-box configuration) and a freshly emerged R10B-0
Erlang. Maybe it is something to take a look at?

I would be very grateful if you have the time to do that, and at
least give me some hints what to look for in the setup of my machine.
I need to use the slave module to build our application locally.
Everything else works just fine.

rgds,

/Gösta Ask (at Mobile Arts dot com)

==============================================================
             The logs
==============================================================

On a Solaris machine, postiljon, the slave node starts fine:

bash-2.05$ which erl
/opt/MA/otp/R9C-0/bin/erl
bash-2.05$ erl -sname foo
Erlang (BEAM) emulator version 5.3 [source] [hipe]

Eshell V5.3  (abort with ^G)
(foo@REDACTED)1> net_adm:ping(bar@REDACTED).
pong
(foo@REDACTED)2> nodes().
[bar@REDACTED]
(foo@REDACTED)3> slave:start(postiljon, foobar).
{ok,foobar@REDACTED}
(foo@REDACTED)4> nodes().
[bar@REDACTED,foobar@REDACTED]
(foo@REDACTED)7> slave:stop(foobar@REDACTED).
ok
(foo@REDACTED)8> nodes().
[bar@REDACTED]
(foo@REDACTED)9>
(foo@REDACTED)10> inet_db:get_rc().
[{domain,"mobilearts.local"},{nameserver,{192,168,211,3}}]
(foo@REDACTED)11> inet_db:res_option(lookup).
[file,dns]

User switch command
  --> q

bash-2.05$ uname -a
SunOS postiljon 5.9 Generic_112233-10 sun4u sparc SUNW,UltraAX-i2
bash-2.05$

==============================================================

But on my gentoo machine (falcon) it fails:

askg@REDACTED askg $ which erl
/usr/local/bin/erl
askg@REDACTED askg $ erl -sname foo
Erlang (BEAM) emulator version 5.3.6.2 [source] [hipe]

Eshell V5.3.6.2  (abort with ^G)
(foo@REDACTED)1>
(foo@REDACTED)2> net_adm:ping(bar@REDACTED).
pong
(foo@REDACTED)3> nodes().
[bar@REDACTED]

[start the slave node in debug mode]

(foo@REDACTED)10> dbg:c(slave,start,[falcon, foobar]).
(<0.58.0>) init ! {<0.58.0>,{get_argument,progname}}
(<0.58.0>) out {init,request,1}
(<0.58.0>) << {init,{ok,[["erl"]]}}
(<0.58.0>) in {init,request,1}
(<0.58.0>) << timeout
(<0.58.0>) <0.18.0> ! {'$gen_call',{<0.58.0>,#Ref<0.0.0.308>},longnames}
(<0.58.0>) out {gen,wait_resp_mon,3}
(<0.58.0>) << {#Ref<0.0.0.308>,false}
(<0.58.0>) in {gen,wait_resp_mon,3}
(<0.58.0>) << timeout
(<0.58.0>) << timeout
(<0.58.0>) <0.18.0> ! {'$gen_call',{<0.58.0>,#Ref<0.0.0.310>},
                                    {connect,normal,foobar@REDACTED}}
(<0.58.0>) out {gen,wait_resp_mon,3}
(<0.58.0>) << {#Ref<0.0.0.310>,false}
(<0.58.0>) in {gen,wait_resp_mon,3}
(<0.58.0>) << timeout
(<0.58.0>) <0.58.0> ! {'DOWN',#Ref<0.0.0.322>,
                               process,
                               {net_kernel,foobar@REDACTED},
                               noconnection}
(<0.58.0>) << {'DOWN',#Ref<0.0.0.322>,
                       process,
                       {net_kernel,foobar@REDACTED},
                       noconnection}
[garbage coll.]

(<0.58.0>) << timeout
(<0.58.0>) <0.18.0> ! {'$gen_call',{<0.58.0>,#Ref<0.0.0.323>},
                                    {disconnect,foobar@REDACTED}}
(<0.58.0>) out {gen,wait_resp_mon,3}
(<0.58.0>) << {#Ref<0.0.0.323>,false}
(<0.58.0>) in {gen,wait_resp_mon,3}
(<0.58.0>) << timeout

[here is the call to spawn which tries to start the slave node]

(<0.58.0>) spawn <0.61.0> as slave:wait_for_slave(<0.58.0>,"falcon",foobar,foobar@REDACTED,[],no_link,erl)
(<0.58.0>) out {slave,start_it,6}
(<0.61.0>) in {slave,wait_for_slave,7}
(<0.61.0>) register slave_waiter_0
(<0.61.0>) << timeout
(<0.61.0>) <0.18.0> ! {'$gen_call',{<0.61.0>,#Ref<0.0.0.326>},longnames}
(<0.61.0>) out {gen,wait_resp_mon,3}
(<0.61.0>) << {#Ref<0.0.0.326>,false}
(<0.61.0>) in {gen,wait_resp_mon,3}
(<0.61.0>) << timeout

[garbage coll.]

(<0.61.0>) out {slave,wait_for_slave,7}
(<0.61.0>) in {slave,wait_for_slave,7}
(<0.61.0>) << timeout

[garbage coll.]

(<0.61.0>) << timeout
(<0.61.0>) <0.18.0> ! {'$gen_call',{<0.61.0>,#Ref<0.0.0.352>},
                                    {connect,normal,foobar@REDACTED}}
(<0.61.0>) out {gen,wait_resp_mon,3}
(<0.61.0>) << {#Ref<0.0.0.352>,false}
(<0.61.0>) in {gen,wait_resp_mon,3}
(<0.61.0>) << timeout
(<0.61.0>) <0.61.0> ! {'DOWN',#Ref<0.0.0.361>,
                               process,
                               {net_kernel,foobar@REDACTED},
                               noconnection}
(<0.61.0>) << {'DOWN',#Ref<0.0.0.361>,
                       process,
                       {net_kernel,foobar@REDACTED},
                       noconnection}
(<0.61.0>) << timeout
(<0.61.0>) <0.18.0> ! {'$gen_call',{<0.61.0>,#Ref<0.0.0.362>},
                                    {disconnect,foobar@REDACTED}}
(<0.61.0>) out {gen,wait_resp_mon,3}
(<0.61.0>) << {#Ref<0.0.0.362>,false}
(<0.61.0>) in {gen,wait_resp_mon,3}

[here is the timeout]

(<0.61.0>) << timeout
(<0.61.0>) <0.58.0> ! {result,{error,timeout}}
(<0.58.0>) << {result,{error,timeout}}
(<0.61.0>) exit normal
(<0.61.0>) unregister slave_waiter_0
(<0.58.0>) in {slave,start_it,6}
{error,timeout}

(foo@REDACTED)13> inet_db:get_rc().
[{domain,"mobilearts.local"},
  {nameserver,{192,168,211,3}},
  {nameserver,{80,252,160,162}},
  {nameserver,{80,252,160,164}}]
(foo@REDACTED)14> inet_db:res_option(lookup).
[file,dns]
(foo@REDACTED)15>

askg@REDACTED askg $ uname -a
Linux falcon 2.4.20-gentoo-r6 #1 Fri Feb 27 10:59:40 CET 2004 i686 Pentium II (Deschutes) GenuineIntel GNU/Linux

an erl_crashdump is created:

askg@REDACTED askg $ ls -l erl*
-rw-r-----    1 askg     users      155290 Dec 29 09:30 erl_crash.dump

(foo@REDACTED)16> crashdump_viewer:start().
WebTool is available at http://localhost:8888/
Or  http://127.0.0.1:8888/
ok

showing, for example, under "General"

Slogan 			Kernel pid terminated (application_controller) (shutdown)
Node name 		'nonode@REDACTED'
Crashdump created on 	Wed Dec 29 09:30:15 2004

and as "Processes":

Pid 	Name	Spawned as		State	Reductions	Stack+heap	MsgQ Length
<0.0.0>	init	otp_ring0:start/2	Running	4621		6765		1
<0.2.0>	erl_prim_loader	erlang:apply/2	Waiting	9119		233		0
<0.4.0>	error_logger	proc_lib:init_p/5	Waiting	994	233		0
<0.12.0>	 global:init_the_locker/1	Waiting	4	233		0

and "Expand MsgQ" for <0.0.0> reports:

{'EXIT',<0.1.0>,
         {noproc,{gen_server,call,
                             [application_controller,
                              {load_application,stdlib},
                              infinity]}}}

==============================================================




More information about the erlang-bugs mailing list