[erlang-questions] Sudden death of Erlang Node
Valentin Micic
valentin@REDACTED
Fri Jan 19 15:52:22 CET 2007
Serge Aleynikov wrote:
>
> What caused the emulator to close that file descriptor (aside from
> memory exhaustion) is something that have kept bothering me for a while,
>
Quite some time ago I've been asking a similar question in a slightly
different context: in my particular case, an Erlang node running R9 would
close a listening socket (file descriptor), that was advertised via epmd,
with consequence that nobody from outside could connect to the node. Node
itself would crunch its numbers happily away. Interestingly, this was
happening at the same time every day, always on the same node -- enough for
us to conclude that it had to be network related + particular OS patch level
helped with lunar phases... out of desperation, we complied run-time for
this particular OS patch level, using newer version of complier, and, to my
surprise, problem hasn't occurred since. Out of curiosity, does your
run-time reports to stdout something like: "driver went away without
deselecting..." or some similar phrase?
* * *
On the other hand, Frederik noticed something very valid: 25% on quad CPU
machine is 100% of a single CPU. Depending on a particular OS version,
kernel may schedule beam always on a single CPU, and when this happens,
heart process may not receive it's heartbeat on time...
* * *
What's your disk I/O like? I've noticed a very strange behaviour on beams
started with a single thread (i.e. without +A n option) and running dets
intensive applications. Under heavy traffic beam spends to much time waiting
for I/O, thus delaying process scheduling and message processing. We had
such a situation (a huge mnesia database spread over multiple dets files
with relatively high I/O), and we solved it by starting additional threads.
On pre-SMP Erlang, thread pool was used to support port drivers (including
disk I/O), thus enabling "main" thread to run scheduling even when disk is
busy. However, if you running 32-bit Erlang, do not get carried away with
number of threads, because you could easily run out of memory.
V.
More information about the erlang-questions
mailing list