[erlang-questions] rabbit, epmd and bonded interfaces woes

Leonard Boyce <>
Fri Jan 13 20:52:06 CET 2012


Hopefully someone has hit this issue and can shed some light.

Network interfaces are configured bonded in pairs

Rabbit crashes/dumps when trying to start with error;
"Protocol: ~p: register error: ~p~n",["inet_tcp",{{badmatch,
{error,epmd_close}}....<snip>

In my research to date the only similar issue I've been able to find is a 
reference to running ejabberd in FreeBSD jails and the solution was to patch 
epmd to allow all callers (not limit to 127.x.x.x), which is not really safe.

We've tried running epmd -d -d -d, calling from another term unsing "erl -
sname somestrangename" ading using tcpdump to inspect the connection and tcp 
dump shows that "erl -sname somestrangename" seems to be calling from the 
public interface

I have a sneaking suspicion that this has something to do with incorrect 
handling of bonded interfaces as another server with exactly the same 
OS/hardware and software versions (minus bonded interfaces) works perfectly.

We've tried R15B and results are exactly the same.

Any advice/help would be appreciated.

Thanks,
Leonard
---


Environment;
#############################################
Ubuntu 10.04 LTS
Linux web1 2.6.32-37-server #81-Ubuntu SMP Fri Dec 2 20:49:12 UTC 2011 x86_64 
GNU/Linux
Erlang R14B03 (erts-5.8.4) [source] [64-bit] [smp:16:16] [rq:16] [async-
threads:0] [kernel-poll:false]


File: /etc/hostname
#############################################
web1

File: /etc/hosts
#############################################
127.0.0.1       web1 localhost

192.168.100.1   web1 web1.XXXXXX.XXX
192.168.100.83  web2 web2.XXXXXX.XXX

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters


File: /etc/network/interfaces
#############################################
# The loopback network interface
auto lo
iface lo inet loopback

auto bond0
iface bond0 inet static
        address XX.XX.XX.XX
        netmask 255.255.255.240
        gateway xx.xx.xx.xx
        bond-slaves eth0 eth1
        bond_mode 802.3ad
        bond_miimon 100
        bond_lacp_rate 1
auto bond1
iface bond1 inet static
        address 192.168.100.1
        netmask 255.255.255.0
        bond-slaves eth2 eth3
        bond_mode 802.3ad
        bond_miimon 100
        bond_lacp_rate 1
auto bond1:0
iface bond1:0 inet static
        address 192.168.100.2
        netmask 255.255.255.0

auto bond1:1
iface bond1:1 inet static
        address 192.168.100.3
        netmask 255.255.255.0

TCP Dump (sanitized XX.XX.XX.XX for public IP);
#############################################
:~$ sudo tcpdump -i lo -vv port 4369
tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes
14:41:27.873868 IP (tos 0x0, ttl 64, id 61955, offset 0, flags [DF], proto TCP 
(6), length 60)
    XX.XX.XX.XX.42982 > web1.4369: Flags [S], cksum 0xba6b (correct), seq 
2754024620, win 32792, options [mss 16396,sackOK,TS val 24962735 ecr 
0,nop,wscale 7], length 0
14:41:27.873884 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), 
length 60)
    web1.4369 > web1.42982: Flags [S.], cksum 0xeb9d (correct), seq 
4276970213, ack 2754024621, win 32768, options [mss 16396,sackOK,TS val 
24962735 ecr 24962735,nop,wscale 7], length 0
14:41:27.873895 IP (tos 0x0, ttl 64, id 61956, offset 0, flags [DF], proto TCP 
(6), length 52)
    XX.XX.XX.XX.42982 > web1.4369: Flags [.], cksum 0x5897 (correct), seq 
2754024621, ack 4276970214, win 257, options [nop,nop,TS val 24962735 ecr 
24962735], length 0
14:41:27.873938 IP (tos 0x0, ttl 64, id 61957, offset 0, flags [DF], proto TCP 
(6), length 82)
    XX.XX.XX.XX.42982 > web1.4369: Flags [P.], seq 0:30, ack 1, win 257, 
options [nop,nop,TS val 24962735 ecr 24962735], length 30
14:41:27.873945 IP (tos 0x0, ttl 64, id 33865, offset 0, flags [DF], proto TCP 
(6), length 52)
    web1.4369 > web1.42982: Flags [.], cksum 0xd3a4 (correct), seq 1, ack 31, 
win 256, options [nop,nop,TS val 24962735 ecr 24962735], length 0
14:41:27.874143 IP (tos 0x0, ttl 64, id 33866, offset 0, flags [DF], proto TCP 
(6), length 52)
    web1.4369 > web1.42982: Flags [F.], cksum 0xd3a3 (correct), seq 1, ack 31, 
win 256, options [nop,nop,TS val 24962735 ecr 24962735], length 0
14:41:27.874188 IP (tos 0x0, ttl 64, id 61958, offset 0, flags [DF], proto TCP 
(6), length 52)
    XX.XX.XX.XX.42982 > web1.4369: Flags [F.], cksum 0x5877 (correct), seq 30, 
ack 2, win 257, options [nop,nop,TS val 24962735 ecr 24962735], length 0
14:41:27.874202 IP (tos 0x0, ttl 64, id 33867, offset 0, flags [DF], proto TCP 
(6), length 52)
    web1.4369 > web1.42982: Flags [.], cksum 0xd3a2 (correct), seq 2, ack 32, 
win 256, options [nop,nop,TS val 24962735 ecr 24962735], length 0



epmd output;
#############################################
:/usr/local/src# epmd -d -d -d
epmd: Fri Jan 13 14:41:25 2012: epmd running - daemon = 0
epmd: Fri Jan 13 14:41:25 2012: try to initiate listening port 4369
epmd: Fri Jan 13 14:41:25 2012: entering the main select() loop
epmd: Fri Jan 13 14:41:27 2012: Non-local peer connected
epmd: Fri Jan 13 14:41:27 2012: time in seconds: 1326483687
epmd: Fri Jan 13 14:41:27 2012: opening connection on file descriptor 4
epmd: Fri Jan 13 14:41:27 2012: time in seconds: 1326483687
epmd: Fri Jan 13 14:41:27 2012: got 30 bytes
***** 00000000  00 1c 78 b8 4a 4d 00 00  05 00 05 00 0f 73 6f 6d  
|..x.JM.......som|
***** 00000010  65 73 74 72 61 6e 67 65  6e 61 6d 65 00 00        |
estrangename..|
epmd: Fri Jan 13 14:41:27 2012: time in seconds: 1326483687
epmd: Fri Jan 13 14:41:27 2012: ** got ALIVE2_REQ
epmd: Fri Jan 13 14:41:27 2012: ALIVE2_REQ from non local address
epmd: Fri Jan 13 14:41:27 2012: closing connection on file descriptor 4






More information about the erlang-questions mailing list