node startup crash

Serge Aleynikov serge@REDACTED
Tue Apr 26 01:01:59 CEST 2005


In my attempt to troubleshoot this issue of a "erl -sname a" startup 
crash, I determined that the problem is indeed related to epmd.  I 
slightly modified epmd_srv.c to output more tracing:

--- /home/serge/tmp/epmd_srv.c  2005-04-25 18:45:31.328089912 -0400
+++ epmd_srv.c  2005-04-25 18:41:23.570754752 -0400
@@ -227,8 +227,12 @@
           if (g->delay_accept)          /* Test of busy server */
             sleep(g->delay_accept);

-         if(FD_ISSET(listensock,&read_mask))
-           do_accept(g,listensock);
+          dbg_tty_printf(g,2,"select() triggered. read_mask=%d 
(listensock=%d)", read_mask, listensock);
+
+         if(FD_ISSET(listensock,&read_mask)) {
+           dbg_tty_printf(g,2,"accepting the socket");
+            do_accept(g,listensock);
+          }

           /* Go over all connections and look for open ones */
           {

Then I ran epmd in the debug mode ("epmd -d -d"), fired off an "erl" in 
a separate terminal, and typed:

 > net_kernel:start([a]).

The call hung, and crashed after a timeout
=INFO REPORT==== 25-Apr-2005::18:51:07 ===
Protocol: "inet_tcp": register error: {timeout,
                                           {gen_server,
                                               call,
                                               [erl_epmd,
                                                {register,a,35418},
                                                15000]}}

This is what I saw in the epmd trace:
epmd: Mon Apr 25 18:50:48 2005: epmd running - daemon = 0
epmd: Mon Apr 25 18:50:48 2005: try to initiate listening port 4369
epmd: Mon Apr 25 18:50:48 2005: starting
epmd: Mon Apr 25 18:50:48 2005: entering the main select() loop
epmd: Mon Apr 25 18:50:53 2005: select() triggered. read_mask=0 
(listensock=0)
epmd: Mon Apr 25 18:50:58 2005: select() triggered. read_mask=0 
(listensock=0)
epmd: Mon Apr 25 18:51:03 2005: select() triggered. read_mask=0 
(listensock=0)
epmd: Mon Apr 25 18:51:08 2005: select() triggered. read_mask=0 
(listensock=0)

What's extreemly odd is that the listensock fd is not set in the 
read_mask, even though the select() call woke up detecting data...

I suppose that there's some incompatibility with libc, but wouldn't 
configure take care of this during the build process? Is there a 
problem, perhaps that I am running the 2.6.8.1 kernel on this host?

~/tmp/otp_src_R10B-4/erts/epmd/src>uname -a
Linux stardev1.corp.idt.net 2.6.8.1 #2 SMP Tue Sep 28 16:04:54 EDT 2004 
i686 i686 i386 GNU/Linux

 >ldd /home/serge/tmp/otp_src_R10B-4/bin/i686-pc-linux-gnu/epmd
         libncurses.so.5 => /usr/lib/libncurses.so.5 (0x4002c000)
         libdl.so.2 => /lib/libdl.so.2 (0x4006c000)
         libm.so.6 => /lib/tls/libm.so.6 (0x4006f000)
         libc.so.6 => /lib/tls/libc.so.6 (0x40091000)
         libgpm.so.1 => /usr/lib/libgpm.so.1 (0x401c8000)
         /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
Serge


Serge Aleynikov wrote:
> Thanks for the advice, but I tried that a couple of times before 
> submitting this posting.  Then I also did:
> 
> ~/otp_src_R10B-4>killall epmd beam
> ~/otp_src_R10B-4>rm -fr /usr/local/lib/erlang
> ~/otp_src_R10B-4>make install
> ~/otp_src_R10B-4>erl -sname abc
> {error_logger,{{2005,4,23},{21,41,36}},'Protocol: ~p: register error: 
> ~p~n',[inet_tcp,{timeout,{gen_server,call,[erl_epmd,{register,abc,35363},15000]}}]} 
> 
> {error_logger,{{2005,4,23},{21,41,36}},crash_report,[[{pid,<0.18.0>},{registered_name,net_kernel},{error_info,{error,badarg} 
> 
> ...
> 
> The same crash.  There's not a lot of stuff running on this host either. 
>  Below is the output of "ps -ax".  We can also see that epmd was started 
> from the right distribution (erts-5.4.5).  This is somewhat annoying, 
> as  I am not sure what else can be checked in order to overcome this 
> crash...
> 
> Thank you.
> 
> Serge
> 
>   PID TTY      STAT   TIME COMMAND
>     1 ?        S      0:00 init [3]
>     2 ?        SW     0:00 [migration/0]
>     3 ?        SWN    0:00 [ksoftirqd/0]
>     4 ?        SW     0:00 [migration/1]
>     5 ?        SWN    0:00 [ksoftirqd/1]
>     6 ?        SW     0:03 [migration/2]
>     7 ?        SWN    0:00 [ksoftirqd/2]
>     8 ?        SW     0:00 [migration/3]
>     9 ?        SWN    0:00 [ksoftirqd/3]
>    10 ?        SW<    0:00 [events/0]
>    11 ?        SW<    0:00 [events/1]
>    12 ?        SW<    0:00 [events/2]
>    13 ?        SW<    0:00 [events/3]
>    14 ?        SW<    0:00 [khelper]
>    15 ?        SW<    0:00 [kblockd/0]
>    16 ?        SW<    0:00 [kblockd/1]
>    17 ?        SW<    0:00 [kblockd/2]
>    18 ?        SW<    0:00 [kblockd/3]
>    53 ?        SW     0:00 [kirqd]
>    54 ?        SW     0:00 [pdflush]
>    55 ?        SW     0:11 [pdflush]
>    56 ?        SW     0:00 [kswapd0]
>    57 ?        SW<    0:00 [aio/0]
>    58 ?        SW<    0:00 [aio/1]
>    59 ?        SW<    0:00 [aio/2]
>    60 ?        SW<    0:00 [aio/3]
>   153 ?        SW     0:00 [kseriod]
>   190 ?        SW     0:00 [scsi_eh_0]
>   191 ?        SW     0:00 [aacraid]
>   204 ?        SW     0:18 [kjournald]
>   520 ?        SW     0:00 [khubd]
>  1043 ?        SW     0:00 [kjournald]
>  1044 ?        SW     0:00 [kjournald]
>  1045 ?        SW     0:06 [kjournald]
>  1463 ?        S      0:00 syslogd -m 0
>  1467 ?        S      0:00 klogd -x
>  1859 ?        S      0:02 /usr/sbin/sshd
>  1884 ?        S      0:00 xinetd -stayalive -pidfile /var/run/xinetd.pid
>  1947 ?        S      0:00 crond
>  2010 ?        S      0:00 proftpd: (accepting connections)
>  2017 tty1     S      0:00 /sbin/mingetty tty1
>  2018 tty2     S      0:00 /sbin/mingetty tty2
>  2019 tty3     S      0:00 /sbin/mingetty tty3
>  2020 tty4     S      0:00 /sbin/mingetty tty4
>  2021 tty5     S      0:00 /sbin/mingetty tty5
>  2022 tty6     S      0:00 /sbin/mingetty tty6
> 11707 ?        S      0:00 sshd: serge [priv]
> 11709 ?        S      0:01 sshd: serge@REDACTED/1
> 11710 pts/1    S      0:01 -bash
> 31265 pts/1    S      0:00 rxvt -sr -fg white -bg black -fn Clean -sl 300
> 31266 pts/0    S      0:00 bash
> 27564 ?        S      0:00 /usr/local/lib/erlang/erts-5.4.5/bin/epmd 
> -daemon
> 27557 pts/0    R      0:00 ps -ax
> 
> klacke@REDACTED wrote:
> 
>> On Fri, Apr 22, 2005 at 10:05:23AM -0400, Serge Aleynikov wrote:
>>
>>> Hi,
>>>
>>> I upgraded Erlang on one of our Linux servers to R10B-4, and started 
>>> getting this error at startup.  This problem wasn't observed before 
>>> the upgrade.
>>>
>>>
>>>> erl -sname a
>>>
>>>
>>> {error_logger,{{2005,4,21},{21,30,39}},'Protocol: ~p: register error: 
>>> ~p~n',[inet_tcp,{timeout,{gen_server,call,[erl_epmd,{register,a,35269},15000]}}]} 
>>>
>>> {error_logger,{{2005,4,21},{21,30,39}},crash_report,
>>
>>
>>
>> .........
>>
>> One of the early Erlang slogans was: "Symolic information is always 
>> available".
>> Somehow, that spung to my mind what I saw your higly _symbolic_
>> error message :-)
>>
>>
>> It does though look as if there is some trouble while
>> registering the node name with epmd.
>> Maybe incompatible epmd vsns ??, 'killall epmd' and retry.
>>
>>
>> /klacke
> 
> 

-- 
================================================================
| Serge Aleynikov                          Tel: (973) 438-3436
| MIS Telecom                              Fax: (973) 438-1457
| IDT Corp.                                   serge@REDACTED
================================================================



More information about the erlang-questions mailing list