[erlang-questions] Erlang VM hanging on node death

Steve Cohen scohen@REDACTED
Mon Jul 10 21:14:07 CEST 2017


Hi all,

We have 12 nodes in a our guilds cluster, and on each, 500,000 processes.
We have another cluster that has 15 nodes with roughly four million
processes on it, called sessions. Both clusters are in the same erlang
distribution since our guilds monitor sessions and vice-versa.

Now, when one of our guild servers dies, as expected it generates a large
number of DOWN messages to the sessions cluster. These messages bog down
the sessions servers (obviously) while they process them, but when they're
done processing, distribution appears to be completely broken.

By broken, I mean that the nodes are disconnected from one another, they're
not exchanging messages, CPU usage was 0 and we couldn't even launch the
remote console.

I can't imagine this is expected behavior, and was wondering if someone can
shed some light on it.
We're open to the idea that we're doing something very, very wrong.


Thanks in advance for the help

-- 
Steve Cohen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20170710/e3b837b3/attachment.htm>


More information about the erlang-questions mailing list