[erlang-questions] disconnected nodes

Ignas Vyšniauskas baliulia@REDACTED
Sat Feb 15 12:57:56 CET 2014


Hi Ahmed,

On 10/22/2012 06:13 PM, Ahmed Omar wrote:
> Hi, We have a cluster of 20+ nodes running R14B04 on Linux Debian
> Squeeze. Suddenly we started having problems where 4-5 would drop out
> of the cluster and we would see this in the logs
>
> =ERROR REPORT==== 2012-10-18 10:49:26 === ** Node 'x@REDACTED' not
> responding ** ** Removing (timedout) connection **
>
> We made a crash dump of some of the nodes and found error_logger has
>  a queue of these messages
>
> {notify,{error,noproc, {emulator,"~s~n",["erts_poll_wait() failed:
> ebadf (9)\n"]}}}
>
> Any hints? (other than changing net_kernel net_tick_time)
>
> Best Regards, Ahmed

Sorry to bump this ancient thread, but have you perhaps found a cause
for this?

We're seeing the same thing during overloads. We're running R15B03
though and additionally to EBADFs get EINVAL failures, i.e.:

    erts_poll_wait() failed: einval

How you maybe figured out anything specific causing the EBADFs?

Thanks,
Ignas



More information about the erlang-questions mailing list