[erlang-questions] crash dump at ejabberd startup

tom@REDACTED tom@REDACTED
Thu Nov 18 20:55:40 CET 2010


Hello Michael,

sorry for the delay, I was off the office for one day.

Ok, to be precisely I repeated the test:

I first started tcpdum and let it run.
I then started epmd -d -d -d in a second shell.
I finally started erl -name foo in a third shell.

epdm -d -d -d now spit out something more informative:

# epmd -d -d -d
epmd: Thu Nov 18 19:26:33 2010: epmd running - daemon = 0
epmd: Thu Nov 18 19:26:33 2010: try to initiate listening port 4369
epmd: Thu Nov 18 19:26:33 2010: starting
epmd: Thu Nov 18 19:26:33 2010: entering the main select() loop
epmd: Thu Nov 18 19:26:38 2010: time in seconds: 1290108398
epmd: Thu Nov 18 19:26:43 2010: time in seconds: 1290108403
epmd: Thu Nov 18 19:26:44 2010: Non-local peer connected
epmd: Thu Nov 18 19:26:44 2010: time in seconds: 1290108404
epmd: Thu Nov 18 19:26:44 2010: opening connection on file descriptor 4
epmd: Thu Nov 18 19:26:44 2010: time in seconds: 1290108404
epmd: Thu Nov 18 19:26:44 2010: got 18 bytes
***** 00000000  00 10 78 7c e5 4d 00 00  05 00 05 00 03 66 6f 6f  
|..x|.M.......foo|
***** 00000010  00 00                                             |..|
epmd: Thu Nov 18 19:26:44 2010: time in seconds: 1290108404
epmd: Thu Nov 18 19:26:44 2010: ** got ALIVE2_REQ
epmd: Thu Nov 18 19:26:44 2010: ALIVE2_REQ from non local address
epmd: Thu Nov 18 19:26:44 2010: closing connection on file descriptor 4
epmd: Thu Nov 18 19:26:49 2010: time in seconds: 1290108409
epmd: Thu Nov 18 19:26:54 2010: time in seconds: 1290108414
epmd: Thu Nov 18 19:26:59 2010: time in seconds: 1290108419

The inetrc file I was using was:

{lookup,[file, dns]}.
{host,{64,120,5,168}, ["mail.kepos.org"]}.
{file, resolv, "/etc/resolv.conf"}.


I then tried the same with the following inetrc version:

{lookup,[file, dns]}.
{host,{127,0,0,1}, ["localhost"]}.
{file, resolv, "/etc/resolv.conf"}.


# epmd -d -d -d
epmd: Thu Nov 18 19:33:50 2010: epmd running - daemon = 0
epmd: Thu Nov 18 19:33:50 2010: try to initiate listening port 4369
epmd: Thu Nov 18 19:33:50 2010: starting
epmd: Thu Nov 18 19:33:50 2010: entering the main select() loop
epmd: Thu Nov 18 19:33:55 2010: time in seconds: 1290108835
epmd: Thu Nov 18 19:34:00 2010: time in seconds: 1290108840
epmd: Thu Nov 18 19:34:01 2010: Non-local peer connected
epmd: Thu Nov 18 19:34:01 2010: time in seconds: 1290108841
epmd: Thu Nov 18 19:34:01 2010: opening connection on file descriptor 4
epmd: Thu Nov 18 19:34:01 2010: time in seconds: 1290108841
epmd: Thu Nov 18 19:34:01 2010: got 18 bytes
***** 00000000  00 10 78 3a f4 4d 00 00  05 00 05 00 03 66 6f 6f  
|..x:.M.......foo|
***** 00000010  00 00                                             |..|
epmd: Thu Nov 18 19:34:01 2010: time in seconds: 1290108841
epmd: Thu Nov 18 19:34:01 2010: ** got ALIVE2_REQ
epmd: Thu Nov 18 19:34:01 2010: ALIVE2_REQ from non local address
epmd: Thu Nov 18 19:34:01 2010: closing connection on file descriptor 4
epmd: Thu Nov 18 19:34:06 2010: time in seconds: 1290108846
epmd: Thu Nov 18 19:34:11 2010: time in seconds: 1290108851
^C

Well, the expert are you, not me, but doesn't look this as epmd does not 
like to get the connection from a "non-local" address ?


tcpdump in both cases did not put out anything helpful 
(see also my earlier posting).

# tcpdump -i lo0 -vv port 4369
tcpdump: WARNING: lo0: no IPv4 address assigned
tcpdump: listening on lo0, link-type NULL (BSD loopback), capture size 96 
bytes
19:34:01.445154 IP (tos 0x0, ttl 64, id 42836, offset 0, flags [DF], proto 
TCP (6), length 60, bad cksum 0 (->728)!)
    mail.58928 > mail.4369: Flags [S], cksum 0x1c18 (correct), seq 
1795656002, win 65535, options [mss 16344,nop,wscale 3,sackOK,TS val 
221277388 ecr 0], length 0
19:34:01.445174 IP (tos 0x0, ttl 64, id 42837, offset 0, flags [DF], proto 
TCP (6), length 60, bad cksum 0 (->727)!)
    mail.4369 > mail.58928: Flags [S.], cksum 0xab70 (correct), seq 
3890252666, ack 1795656003, win 65535, options [mss 16344,nop,wscale 
3,sackOK,TS val 1967491061 ecr 221277388], length 0
19:34:01.445186 IP (tos 0x0, ttl 64, id 42838, offset 0, flags [DF], proto 
TCP (6), length 52, bad cksum 0 (->72e)!)
    mail.58928 > mail.4369: Flags [.], cksum 0xf15c (correct), seq 1, ack 1, 
win 8960, options [nop,nop,TS val 221277388 ecr 1967491061], length 0
19:34:01.445214 IP (tos 0x0, ttl 64, id 42839, offset 0, flags [DF], proto 
TCP (6), length 70, bad cksum 0 (->71b)!)
    mail.58928 > mail.4369: Flags [P.], cksum 0x07d5 (correct), seq 1:19, 
ack 1, win 8960, options [nop,nop,TS val 221277388 ecr 1967491061], length 
18
19:34:01.445357 IP (tos 0x0, ttl 64, id 42846, offset 0, flags [DF], proto 
TCP (6), length 52, bad cksum 0 (->726)!)
    mail.4369 > mail.58928: Flags [F.], cksum 0xf149 (correct), seq 1, ack 
19, win 8960, options [nop,nop,TS val 1967491061 ecr 221277388], length 0
19:34:01.445368 IP (tos 0x0, ttl 64, id 42847, offset 0, flags [DF], proto 
TCP (6), length 52, bad cksum 0 (->725)!)
    mail.58928 > mail.4369: Flags [.], cksum 0xf149 (correct), seq 19, ack 
2, win 8960, options [nop,nop,TS val 221277388 ecr 1967491061], length 0
19:34:01.445391 IP (tos 0x0, ttl 64, id 42850, offset 0, flags [DF], proto 
TCP (6), length 52, bad cksum 0 (->722)!)
    mail.58928 > mail.4369: Flags [F.], cksum 0xf148 (correct), seq 19, ack 
2, win 8960, options [nop,nop,TS val 221277388 ecr 1967491061], length 0
19:34:01.445400 IP (tos 0x0, ttl 64, id 42851, offset 0, flags [DF], proto 
TCP (6), length 52, bad cksum 0 (->721)!)
    mail.4369 > mail.58928: Flags [.], cksum 0xf149 (correct), seq 2, ack 
20, win 8959, options [nop,nop,TS val 1967491061 ecr 221277388], length 0
^C
8 packets captured
8 packets received by filter
0 packets dropped by kernel



So, I still guess, there might be a problem as Erlang somehow insists on 
using localhost solely while this isn't a good thing for FreeBSD Jails as 
Jails just have no fully functionable localhost (127.0.0.1 and locahost 
exist and answer for pings, yes, but there are limitations nonetheless).

If there was a way to make Erlang use any configurable IP instead of 
localhost, the issue almost probably was resolved. 

I therefor tried to vary the content of the inetrc file but it seems, that's 
not enough to really point Erlang to the true IP address.

As you asked for mail.kepos.org and 64.120.5.168:
These are the Jails hostname and single IP address as also properly used by 
Postfix, Dovecot and MySQl in the same jail.

I meanwile got a hint from the freebsd-questions Mailinglist: The person 
posting there mentioned, they would use Erlang 13B to run ejabberd 2.1.5 in 
reeBSD Jails without any issue as they were faced the same issue earlier as 
me now (with erlang 14B). Also this would match my earlier experience with 
an older Erlang half a year ago: No problems that time (but cannot remember 
which Erlang version I uses).

Even I of course cannot be sure, the recent changes in Erlang 14B you 
mentioned might be the cause of all this. But as I have no clue about Erlang 
I only want to mention this as one possibility.

What do you think ?

kind regards
Tom





On Wednesday 17 November 2010 15:48:54 Michael Santos wrote:
> On Wed, Nov 17, 2010 at 04:59:52AM +0100, tom@REDACTED wrote:
> > > It might help. I think the quickest way to debug this would be to:
> > donesults below ...
> 
> Did you run these steps concurrently or sequentially? e.g., when you
> brought up the erlang node, was the debug epmd running?
> 
> epmd (with debug switches) needs to be running in one shell, the
> tcpdump's (with the "-n" switch) in another. Then start up the Erlang
> node.
> 
> > # tcpdump -i lo0 -vv port 4369
> > tcpdump: WARNING: lo0: no IPv4 address assigned
> > tcpdump: listening on lo0, link-type NULL (BSD loopback), capture size
> > 96 bytes
> 
> Was the output below from the dump of the loopback?
> 
> > and tcpdump output:
> > 
> > 03:33:15.964680 IP (tos 0x0, ttl 64, id 24389, offset 0, flags [DF],
> > proto TCP (6), length 60, bad cksum 0 (->4f37)!)
> > 
> >     mail.kepos.org.10975 > mail.kepos.org.4369: Flags [S], cksum 0xd692
> > 
> > (correct), seq 3123827793, win 65535, options [mss 16344,nop,wscale
> > 3,sackOK,TS val 77847542 ecr 0], length 0
> > 03:33:15.964701 IP (tos 0x0, ttl 64, id 24390, offset 0, flags [DF],
> > proto TCP (6), length 60, bad cksum 0 (->4f36)!)
> > 
> >     mail.kepos.org.4369 > mail.kepos.org.10975: Flags [S.], cksum
> >     0x776a
> > 
> > (correct), seq 3252621940, ack 3123827794, win 65535, options [mss
> > 16344,nop,wscale 3,sackOK,TS val 2304704868 ecr 77847542], length 0
> > 03:33:15.964712 IP (tos 0x0, ttl 64, id 24391, offset 0, flags [DF],
> > proto TCP (6), length 52, bad cksum 0 (->4f3d)!)
> > 
> >     mail.kepos.org.4369 > mail.kepos.org.10975: Flags [F.], cksum
> >     0xbd43
> > 
> > (correct), seq 1, ack 19, win 8960, options [nop,nop,TS val 2304704868
> > ecr 77847542], length 0
> > 03:33:15.964906 IP (tos 0x0, ttl 64, id 24394, offset 0, flags [DF],
> > proto TCP (6), length 52, bad cksum 0 (->4f3a)!)
> > 
> >     mail.kepos.org.10975 > mail.kepos.org.4369: Flags [.], cksum 0xbd43
> > 
> > (correct), seq 19, ack 2, win 8960, options [nop,nop,TS val 77847542
> > ecr 2304704868], length 0
> > 03:33:15.964929 IP (tos 0x0, ttl 64, id 24395, offset 0, flags [DF],
> > proto TCP (6), length 52, bad cksum 0 (->4f39)!)
> > 
> >     mail.kepos.org.10975 > mail.kepos.org.4369: Flags [F.], cksum
> >     0xbd42
> > 
> > (correct), seq 19, ack 2, win 8960, options [nop,nop,TS val 77847542
> > ecr 2304704868], length 0
> > 03:33:15.964937 IP (tos 0x0, ttl 64, id 24396, offset 0, flags [DF],
> > proto TCP (6), length 52, bad cksum 0 (->4f38)!)
> > 
> >     mail.kepos.org.4369 > mail.kepos.org.10975: Flags [.], cksum 0xbd43
> > 
> > (correct), seq 2, ack 20, win 8959, options [nop,nop,TS val 2304704868
> > ecr 77847542], length 0
> > ^C
> 
> Do not resolve IP addresses. We need to see the source and destination
> IP addresses.
> 
> So something on port 4369 (on whatever "mail.kepos.org" is) is accepting
> and closing the connection. If it is not epmd (because there is nothing
> in your debug log), what is it?
> 
> > Juust to exde any misconfiguation on the machine I allso  tried itt
> > again switching off the firewall for this test: same result.
> 
> Always a good idea.
> 
> 
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED


More information about the erlang-questions mailing list