[erlang-questions] Intermittent failures *reconnecting* C hidden nodes
Andy Sloane
andy.sloane@REDACTED
Tue Jul 10 00:20:02 CEST 2007
On 7/9/07, David Hopwood <david.hopwood@REDACTED> wrote:
> Perhaps when the node comes back up with a different name, some
> assumption made by the distribution protocol is being violated. This
> is just speculation, though; I don't know the protocol in detail.
I thought of that, and upon further reflection it makes a little sense
at least -- it would determine who connects to whom in cases where
there might be some ambiguity. In the case of hidden nodes, though,
it shouldn't really apply -- our C nodes always have the same name
every time they come back up.
Regardless, patching up to R11B-5 seems to have solved this issue; at
least, it has not recurred yet. So... please disregard my clamoring!
As for the other issues I mentioned: our beam instance ran itself out
of memory this morning, leaving an incredibly detailed crash dump
which has been quite helpful so far -- thank you for making it
mostly-human-readable! But the numbers on memory usage don't add up
(it had 2.1 gigs allocated, and it tried to allocate an additional
900some megs, but the largest Stack+Heap size is about 130 megs, all
backlogged messages, and none of the other processes were above 20k).
I'll post a new thread after some more investigation.
One other thing: the crash dump may have revealed references to
process ids on dead C nodes, which may be a source of many of our
problems.
-Andy
More information about the erlang-questions
mailing list