gen_server:call across nodes hangs indefinitely, net_adm:ping works
Fri Oct 22 15:59:10 CEST 2010
We're seeing a customer have some weird issues with RabbitMQ, and it's
beginning to look like an Erlang problem.
They have a two node system, with both nodes running R13B04 on
virtualised CentOS 5. The VMs are both using iptables as a firewall,
with the epmd port open, plus one other port for inter-node message
passing (using inet_dist_listen_min / max to limit to that exact port).
After the system has been up for a number of hours / days, they're
seeing RabbitMQ hanging in various ways. Investigation of the system in
this state shows that:
* epmd is up and working
* epmd -d -names gives the expected results
* net_adm:ping/1 from either node to the other works
* any Rabbit APIs that invoke gen_server:call/3 locally work
* any Rabbit APIs that invoke gen_server:call/3 across nodes hang
We've also seen a weird-looking error pop up around the same time
(user_sup dies with "eio", see attached log), although I'm unclear as to
whether this is a cause or a symptom.
Unfortunately I'm a long way away from cutting this down to a minimal
test case yet; I can't even replicate this myself. But does this look
like anything anyone's ever seen before?
Staff Engineer, RabbitMQ
SpringSource, a division of VMware
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 982 bytes
Desc: not available
More information about the erlang-questions