[erlang-questions] rabbit, epmd and bonded interfaces woes

Michael Santos michael.santos@REDACTED
Fri Jan 13 21:15:16 CET 2012


On Fri, Jan 13, 2012 at 02:52:06PM -0500, Leonard Boyce wrote:
> Hopefully someone has hit this issue and can shed some light.
> 
> Network interfaces are configured bonded in pairs
> 
> Rabbit crashes/dumps when trying to start with error;
> "Protocol: ~p: register error: ~p~n",["inet_tcp",{{badmatch,
> {error,epmd_close}}....<snip>
> 
> In my research to date the only similar issue I've been able to find is a 
> reference to running ejabberd in FreeBSD jails and the solution was to patch 
> epmd to allow all callers (not limit to 127.x.x.x), which is not really safe.

That patch reverts to the old behaviour of epmd (allow all query types
from remote clients).

> We've tried running epmd -d -d -d, calling from another term unsing "erl -
> sname somestrangename" ading using tcpdump to inspect the connection and tcp 
> dump shows that "erl -sname somestrangename" seems to be calling from the 
> public interface
> 
> I have a sneaking suspicion that this has something to do with incorrect 
> handling of bonded interfaces as another server with exactly the same 
> OS/hardware and software versions (minus bonded interfaces) works perfectly.
> 
> We've tried R15B and results are exactly the same.

Newer versions of epmd consider connections with the same source and
destination address as local. What are the source and destination
addresses for the TCP connection to epmd?

> Any advice/help would be appreciated.
> 
> Thanks,
> Leonard
> ---
> 
> 
> Environment;
> #############################################
> Ubuntu 10.04 LTS
> Linux web1 2.6.32-37-server #81-Ubuntu SMP Fri Dec 2 20:49:12 UTC 2011 x86_64 
> GNU/Linux
> Erlang R14B03 (erts-5.8.4) [source] [64-bit] [smp:16:16] [rq:16] [async-
> threads:0] [kernel-poll:false]
> 
> 
> File: /etc/hostname
> #############################################
> web1
> 
> File: /etc/hosts
> #############################################
> 127.0.0.1       web1 localhost
> 
> 192.168.100.1   web1 web1.XXXXXX.XXX
> 192.168.100.83  web2 web2.XXXXXX.XXX
> 
> # The following lines are desirable for IPv6 capable hosts
> ::1     localhost ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> 
> 
> File: /etc/network/interfaces
> #############################################
> # The loopback network interface
> auto lo
> iface lo inet loopback
> 
> auto bond0
> iface bond0 inet static
>         address XX.XX.XX.XX
>         netmask 255.255.255.240
>         gateway xx.xx.xx.xx
>         bond-slaves eth0 eth1
>         bond_mode 802.3ad
>         bond_miimon 100
>         bond_lacp_rate 1
> auto bond1
> iface bond1 inet static
>         address 192.168.100.1
>         netmask 255.255.255.0
>         bond-slaves eth2 eth3
>         bond_mode 802.3ad
>         bond_miimon 100
>         bond_lacp_rate 1
> auto bond1:0
> iface bond1:0 inet static
>         address 192.168.100.2
>         netmask 255.255.255.0
> 
> auto bond1:1
> iface bond1:1 inet static
>         address 192.168.100.3
>         netmask 255.255.255.0
> 
> TCP Dump (sanitized XX.XX.XX.XX for public IP);
> #############################################
> leonard@REDACTED:~$ sudo tcpdump -i lo -vv port 4369
> tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes
> 14:41:27.873868 IP (tos 0x0, ttl 64, id 61955, offset 0, flags [DF], proto TCP 
> (6), length 60)
>     XX.XX.XX.XX.42982 > web1.4369: Flags [S], cksum 0xba6b (correct), seq 
> 2754024620, win 32792, options [mss 16396,sackOK,TS val 24962735 ecr 
> 0,nop,wscale 7], length 0
> 14:41:27.873884 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), 
> length 60)
>     web1.4369 > web1.42982: Flags [S.], cksum 0xeb9d (correct), seq 
> 4276970213, ack 2754024621, win 32768, options [mss 16396,sackOK,TS val 
> 24962735 ecr 24962735,nop,wscale 7], length 0
> 14:41:27.873895 IP (tos 0x0, ttl 64, id 61956, offset 0, flags [DF], proto TCP 
> (6), length 52)
>     XX.XX.XX.XX.42982 > web1.4369: Flags [.], cksum 0x5897 (correct), seq 
> 2754024621, ack 4276970214, win 257, options [nop,nop,TS val 24962735 ecr 
> 24962735], length 0
> 14:41:27.873938 IP (tos 0x0, ttl 64, id 61957, offset 0, flags [DF], proto TCP 
> (6), length 82)
>     XX.XX.XX.XX.42982 > web1.4369: Flags [P.], seq 0:30, ack 1, win 257, 
> options [nop,nop,TS val 24962735 ecr 24962735], length 30
> 14:41:27.873945 IP (tos 0x0, ttl 64, id 33865, offset 0, flags [DF], proto TCP 
> (6), length 52)
>     web1.4369 > web1.42982: Flags [.], cksum 0xd3a4 (correct), seq 1, ack 31, 
> win 256, options [nop,nop,TS val 24962735 ecr 24962735], length 0
> 14:41:27.874143 IP (tos 0x0, ttl 64, id 33866, offset 0, flags [DF], proto TCP 
> (6), length 52)
>     web1.4369 > web1.42982: Flags [F.], cksum 0xd3a3 (correct), seq 1, ack 31, 
> win 256, options [nop,nop,TS val 24962735 ecr 24962735], length 0
> 14:41:27.874188 IP (tos 0x0, ttl 64, id 61958, offset 0, flags [DF], proto TCP 
> (6), length 52)
>     XX.XX.XX.XX.42982 > web1.4369: Flags [F.], cksum 0x5877 (correct), seq 30, 
> ack 2, win 257, options [nop,nop,TS val 24962735 ecr 24962735], length 0
> 14:41:27.874202 IP (tos 0x0, ttl 64, id 33867, offset 0, flags [DF], proto TCP 
> (6), length 52)
>     web1.4369 > web1.42982: Flags [.], cksum 0xd3a2 (correct), seq 2, ack 32, 
> win 256, options [nop,nop,TS val 24962735 ecr 24962735], length 0
> 
> 
> 
> epmd output;
> #############################################
> root@REDACTED:/usr/local/src# epmd -d -d -d
> epmd: Fri Jan 13 14:41:25 2012: epmd running - daemon = 0
> epmd: Fri Jan 13 14:41:25 2012: try to initiate listening port 4369
> epmd: Fri Jan 13 14:41:25 2012: entering the main select() loop
> epmd: Fri Jan 13 14:41:27 2012: Non-local peer connected
> epmd: Fri Jan 13 14:41:27 2012: time in seconds: 1326483687
> epmd: Fri Jan 13 14:41:27 2012: opening connection on file descriptor 4
> epmd: Fri Jan 13 14:41:27 2012: time in seconds: 1326483687
> epmd: Fri Jan 13 14:41:27 2012: got 30 bytes
> ***** 00000000  00 1c 78 b8 4a 4d 00 00  05 00 05 00 0f 73 6f 6d  
> |..x.JM.......som|
> ***** 00000010  65 73 74 72 61 6e 67 65  6e 61 6d 65 00 00        |
> estrangename..|
> epmd: Fri Jan 13 14:41:27 2012: time in seconds: 1326483687
> epmd: Fri Jan 13 14:41:27 2012: ** got ALIVE2_REQ
> epmd: Fri Jan 13 14:41:27 2012: ALIVE2_REQ from non local address
> epmd: Fri Jan 13 14:41:27 2012: closing connection on file descriptor 4
> 
> 
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions



More information about the erlang-questions mailing list