VM silently exits
Wed Jul 28 22:29:44 CEST 2010
With R13B04 running on Montavista Linux, I've seen a few cases
recently where the Erlang VM simply exits without any log messages,
crash dumps, or coredumps. It appears to happen only after days of
running under load, making it hard to reproduce and investigate. It
may be related to memory consumption, but I'm not sure. External
programs like heart and memsup simply report "Erlang has closed,"
which according to the source code means they each got a return value
of 0 from read() on their connection to the VM, which in turn would
seem to indicate that the VM side of the connection was simply closed.
This would all seem to indicate that the VM was either killed by
another process or that it called exit() itself. The Linux "OOM
killer" is disabled on this system, and I don't know of any other
process that would be killing the VM. There are no alarms in the logs
about hitting memory high watermarks or anything like that, and we
aren't using any options to change allocators or anything like that.
Anybody ever seen anything like this?
I've found a few places in the VM C source code where exit() is called
without logging anything. Some of these are normal exits, like when
you exit an Erlang shell, where no logging is needed. But others seem
to be error conditions, and there should be logging for those. I think
I'll probably have to patch my system to add logging to those cases to
try to track down this problem -- is there still time to get a patch
like this into R14B? If this issue is memory-related, I suppose it's
possible that a sudden increase in memory consumption could cause the
VM to exit between alarm checks, explaining why things like memsup
don't seem to notice, so it would seem to be fairly critical that
something is logged by the VM itself for such cases.
More information about the erlang-questions