[erlang-questions] heart prevents beam from creating crash dumps
Richard Carlsson
carlsson.richard@REDACTED
Sat Aug 25 21:39:05 CEST 2012
We have had a long-standing problems with not getting any Erlang crash
dumps at all on our live servers. I finally figured out why it happens.
I have already reported this to the OTP folks, but I thought I should
send a summary to the mailing lists for documentation and to give people
a heads-up.
The problem occurs when you start Erlang with the -heart flag
(http://www.erlang.org/doc/man/heart.html). This spawns a small external
C program connected through a port. From Erlang's point of view it's
like any other port program. The heart program pings the Erlang side
every now and then, and if it gets no reply within HEART_BEAT_TIMEOUT
seconds, or if the connection to Erlang breaks, it assumes the Beam
process has gone bad and kills it off with a SIGKILL, and then restarts
Erlang using whatever HEART_COMMAND is set to. So far so good.
Normally, when Beam detects a critical situation (e.g., out of memory)
and decides to shut down, it will create an erl_crash.dump file (or
whatever ERL_CRASH_DUMP is set to). This information can greatly help
figuring out what went wrong. But if the system that crashed was large,
the crash dump file can take quite a long time to create. In order to
make it possible to restart the node (reusing the node name) while the
old defunct system is still writing the crash dump, Beam wants to drop
its connection to the EPMD service before it starts writing the dump,
making it look like the old node has disappeared.
The code that does this is the function prepare_crash_dump() in
erts/emulator/sys/unix/sys.c. The problem from the perspective of the C
code is that the connection to EPMD is on some unknown file descriptor
(just like heart, this has been started as a port from Erlang code). The
solution they chose, and which has been part of the OTP system for
years, is to close _all_ file descriptors except 0-2. This certainly has
the desired effect that EPMD releases the node name for reuse. But it
also, when the loop gets to file descriptor 10 or thereabouts (probably
depending on your system), has the effect of breaking the connection to
the heart program.
In these multicore days, the effect is almost instantaneous. The heart
program immediately wakes up due to the broken pipe and sends SIGKILL to
Beam for good measure, to make sure it's really gone, and then it starts
a new Erlang node. Meanwhile, the old node is still busy closing file
descriptors. Sometimes it makes it as far as 12 before SIGKILL arrives.
The poor thing never has a chance to even open the crash dump file for
writing. And your operations people only see a weird restart without any
further clues.
I don't have a good solution right now, except "don't use -heart". And
it might be that one wants to separate the automatic restarting of a
crashed node from the automatic killing of an unresponsive node anyway.
Suggestions are welcome.
/Richard
More information about the erlang-questions
mailing list