[erlang-questions] Intermittent failures *reconnecting* C hidden nodes

Andy Sloane andy.sloane@REDACTED
Tue Jul 10 00:20:02 CEST 2007


On 7/9/07, David Hopwood <david.hopwood@REDACTED> wrote:
> Perhaps when the node comes back up with a different name, some
> assumption made by the distribution protocol is being violated. This
> is just speculation, though; I don't know the protocol in detail.

I thought of that, and upon further reflection it makes a little sense
at least -- it would determine who connects to whom in cases where
there might be some ambiguity.  In the case of hidden nodes, though,
it shouldn't really apply -- our C nodes always have the same name
every time they come back up.

Regardless, patching up to R11B-5 seems to have solved this issue; at
least, it has not recurred yet.  So... please disregard my clamoring!

As for the other issues I mentioned: our beam instance ran itself out
of memory this morning, leaving an incredibly detailed crash dump
which has been quite helpful so far -- thank you for making it
mostly-human-readable!  But the numbers on memory usage don't add up
(it had 2.1 gigs allocated, and it tried to allocate an additional
900some megs, but the largest Stack+Heap size is about 130 megs, all
backlogged messages, and none of the other processes were above 20k).
I'll post a new thread after some more investigation.

One other thing: the crash dump may have revealed references to
process ids on dead C nodes, which may be a source of many of our
problems.

-Andy



More information about the erlang-questions mailing list