node startup crash
Serge Aleynikov
serge@REDACTED
Tue Apr 26 01:01:59 CEST 2005
In my attempt to troubleshoot this issue of a "erl -sname a" startup
crash, I determined that the problem is indeed related to epmd. I
slightly modified epmd_srv.c to output more tracing:
--- /home/serge/tmp/epmd_srv.c 2005-04-25 18:45:31.328089912 -0400
+++ epmd_srv.c 2005-04-25 18:41:23.570754752 -0400
@@ -227,8 +227,12 @@
if (g->delay_accept) /* Test of busy server */
sleep(g->delay_accept);
- if(FD_ISSET(listensock,&read_mask))
- do_accept(g,listensock);
+ dbg_tty_printf(g,2,"select() triggered. read_mask=%d
(listensock=%d)", read_mask, listensock);
+
+ if(FD_ISSET(listensock,&read_mask)) {
+ dbg_tty_printf(g,2,"accepting the socket");
+ do_accept(g,listensock);
+ }
/* Go over all connections and look for open ones */
{
Then I ran epmd in the debug mode ("epmd -d -d"), fired off an "erl" in
a separate terminal, and typed:
> net_kernel:start([a]).
The call hung, and crashed after a timeout
=INFO REPORT==== 25-Apr-2005::18:51:07 ===
Protocol: "inet_tcp": register error: {timeout,
{gen_server,
call,
[erl_epmd,
{register,a,35418},
15000]}}
This is what I saw in the epmd trace:
epmd: Mon Apr 25 18:50:48 2005: epmd running - daemon = 0
epmd: Mon Apr 25 18:50:48 2005: try to initiate listening port 4369
epmd: Mon Apr 25 18:50:48 2005: starting
epmd: Mon Apr 25 18:50:48 2005: entering the main select() loop
epmd: Mon Apr 25 18:50:53 2005: select() triggered. read_mask=0
(listensock=0)
epmd: Mon Apr 25 18:50:58 2005: select() triggered. read_mask=0
(listensock=0)
epmd: Mon Apr 25 18:51:03 2005: select() triggered. read_mask=0
(listensock=0)
epmd: Mon Apr 25 18:51:08 2005: select() triggered. read_mask=0
(listensock=0)
What's extreemly odd is that the listensock fd is not set in the
read_mask, even though the select() call woke up detecting data...
I suppose that there's some incompatibility with libc, but wouldn't
configure take care of this during the build process? Is there a
problem, perhaps that I am running the 2.6.8.1 kernel on this host?
~/tmp/otp_src_R10B-4/erts/epmd/src>uname -a
Linux stardev1.corp.idt.net 2.6.8.1 #2 SMP Tue Sep 28 16:04:54 EDT 2004
i686 i686 i386 GNU/Linux
>ldd /home/serge/tmp/otp_src_R10B-4/bin/i686-pc-linux-gnu/epmd
libncurses.so.5 => /usr/lib/libncurses.so.5 (0x4002c000)
libdl.so.2 => /lib/libdl.so.2 (0x4006c000)
libm.so.6 => /lib/tls/libm.so.6 (0x4006f000)
libc.so.6 => /lib/tls/libc.so.6 (0x40091000)
libgpm.so.1 => /usr/lib/libgpm.so.1 (0x401c8000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
Serge
Serge Aleynikov wrote:
> Thanks for the advice, but I tried that a couple of times before
> submitting this posting. Then I also did:
>
> ~/otp_src_R10B-4>killall epmd beam
> ~/otp_src_R10B-4>rm -fr /usr/local/lib/erlang
> ~/otp_src_R10B-4>make install
> ~/otp_src_R10B-4>erl -sname abc
> {error_logger,{{2005,4,23},{21,41,36}},'Protocol: ~p: register error:
> ~p~n',[inet_tcp,{timeout,{gen_server,call,[erl_epmd,{register,abc,35363},15000]}}]}
>
> {error_logger,{{2005,4,23},{21,41,36}},crash_report,[[{pid,<0.18.0>},{registered_name,net_kernel},{error_info,{error,badarg}
>
> ...
>
> The same crash. There's not a lot of stuff running on this host either.
> Below is the output of "ps -ax". We can also see that epmd was started
> from the right distribution (erts-5.4.5). This is somewhat annoying,
> as I am not sure what else can be checked in order to overcome this
> crash...
>
> Thank you.
>
> Serge
>
> PID TTY STAT TIME COMMAND
> 1 ? S 0:00 init [3]
> 2 ? SW 0:00 [migration/0]
> 3 ? SWN 0:00 [ksoftirqd/0]
> 4 ? SW 0:00 [migration/1]
> 5 ? SWN 0:00 [ksoftirqd/1]
> 6 ? SW 0:03 [migration/2]
> 7 ? SWN 0:00 [ksoftirqd/2]
> 8 ? SW 0:00 [migration/3]
> 9 ? SWN 0:00 [ksoftirqd/3]
> 10 ? SW< 0:00 [events/0]
> 11 ? SW< 0:00 [events/1]
> 12 ? SW< 0:00 [events/2]
> 13 ? SW< 0:00 [events/3]
> 14 ? SW< 0:00 [khelper]
> 15 ? SW< 0:00 [kblockd/0]
> 16 ? SW< 0:00 [kblockd/1]
> 17 ? SW< 0:00 [kblockd/2]
> 18 ? SW< 0:00 [kblockd/3]
> 53 ? SW 0:00 [kirqd]
> 54 ? SW 0:00 [pdflush]
> 55 ? SW 0:11 [pdflush]
> 56 ? SW 0:00 [kswapd0]
> 57 ? SW< 0:00 [aio/0]
> 58 ? SW< 0:00 [aio/1]
> 59 ? SW< 0:00 [aio/2]
> 60 ? SW< 0:00 [aio/3]
> 153 ? SW 0:00 [kseriod]
> 190 ? SW 0:00 [scsi_eh_0]
> 191 ? SW 0:00 [aacraid]
> 204 ? SW 0:18 [kjournald]
> 520 ? SW 0:00 [khubd]
> 1043 ? SW 0:00 [kjournald]
> 1044 ? SW 0:00 [kjournald]
> 1045 ? SW 0:06 [kjournald]
> 1463 ? S 0:00 syslogd -m 0
> 1467 ? S 0:00 klogd -x
> 1859 ? S 0:02 /usr/sbin/sshd
> 1884 ? S 0:00 xinetd -stayalive -pidfile /var/run/xinetd.pid
> 1947 ? S 0:00 crond
> 2010 ? S 0:00 proftpd: (accepting connections)
> 2017 tty1 S 0:00 /sbin/mingetty tty1
> 2018 tty2 S 0:00 /sbin/mingetty tty2
> 2019 tty3 S 0:00 /sbin/mingetty tty3
> 2020 tty4 S 0:00 /sbin/mingetty tty4
> 2021 tty5 S 0:00 /sbin/mingetty tty5
> 2022 tty6 S 0:00 /sbin/mingetty tty6
> 11707 ? S 0:00 sshd: serge [priv]
> 11709 ? S 0:01 sshd: serge@REDACTED/1
> 11710 pts/1 S 0:01 -bash
> 31265 pts/1 S 0:00 rxvt -sr -fg white -bg black -fn Clean -sl 300
> 31266 pts/0 S 0:00 bash
> 27564 ? S 0:00 /usr/local/lib/erlang/erts-5.4.5/bin/epmd
> -daemon
> 27557 pts/0 R 0:00 ps -ax
>
> klacke@REDACTED wrote:
>
>> On Fri, Apr 22, 2005 at 10:05:23AM -0400, Serge Aleynikov wrote:
>>
>>> Hi,
>>>
>>> I upgraded Erlang on one of our Linux servers to R10B-4, and started
>>> getting this error at startup. This problem wasn't observed before
>>> the upgrade.
>>>
>>>
>>>> erl -sname a
>>>
>>>
>>> {error_logger,{{2005,4,21},{21,30,39}},'Protocol: ~p: register error:
>>> ~p~n',[inet_tcp,{timeout,{gen_server,call,[erl_epmd,{register,a,35269},15000]}}]}
>>>
>>> {error_logger,{{2005,4,21},{21,30,39}},crash_report,
>>
>>
>>
>> .........
>>
>> One of the early Erlang slogans was: "Symolic information is always
>> available".
>> Somehow, that spung to my mind what I saw your higly _symbolic_
>> error message :-)
>>
>>
>> It does though look as if there is some trouble while
>> registering the node name with epmd.
>> Maybe incompatible epmd vsns ??, 'killall epmd' and retry.
>>
>>
>> /klacke
>
>
--
================================================================
| Serge Aleynikov Tel: (973) 438-3436
| MIS Telecom Fax: (973) 438-1457
| IDT Corp. serge@REDACTED
================================================================
More information about the erlang-questions
mailing list