Hi, <div>We have a cluster of 20+ nodes running R14B04 on Linux Debian Squeeze. </div><div>Suddenly we started having problems where 4-5 would drop out of the cluster and we would see this in the logs</div><div><br></div><div>
<div>=ERROR REPORT==== 2012-10-18 10:49:26 ===</div><div>** Node 'x@host-y' not responding **</div><div>** Removing (timedout) connection **</div></div><div><br></div><div>We made a crash dump of some of the nodes and found error_logger has a queue of these messages</div>
<div><div><br></div><div>{notify,{error,noproc,</div><div> {emulator,"~s~n",["erts_poll_wait() failed: ebadf (9)\n"]}}}</div></div><div><br></div><div>Any hints? (other than changing net_kernel net_tick_time)</div>
<div><br></div><div>Best Regards, </div><div>Ahmed</div>