[erlang-questions] Sudden death of Erlang Node

Valentin Micic <>
Fri Jan 19 15:52:22 CET 2007


Serge Aleynikov wrote:

>
> What caused the emulator to close that file descriptor (aside from
> memory exhaustion) is something that have kept bothering me for a while,
>
Quite some time ago I've been asking a similar question in a slightly 
different context: in my particular case, an Erlang node running R9 would 
close a listening socket (file descriptor), that was advertised via epmd, 
with consequence that nobody from outside could connect to the node.  Node 
itself would crunch its numbers happily away. Interestingly, this was 
happening at the same time every day, always on the same node -- enough for 
us to conclude that it had to be network related + particular OS patch level 
helped with lunar phases... out of desperation, we complied run-time for 
this particular OS patch level, using newer version of complier, and, to my 
surprise, problem hasn't occurred since. Out of curiosity, does your 
run-time reports to stdout something like: "driver went away without 
deselecting..." or some similar phrase?

* * *

On the other hand, Frederik noticed something very valid: 25% on quad CPU 
machine is 100% of a single CPU. Depending on a particular OS version, 
kernel may schedule beam always on a single CPU, and when this happens, 
heart process may not receive it's heartbeat on time...

* * *

What's your disk I/O like? I've noticed a very strange behaviour on beams 
started with a single thread (i.e. without +A n option) and running dets 
intensive applications. Under heavy traffic beam spends to much time waiting 
for I/O, thus delaying process scheduling and message processing. We had 
such a situation (a huge mnesia database spread over multiple dets files 
with relatively high I/O), and we solved it by starting additional threads. 
On pre-SMP Erlang, thread pool was used to support port drivers (including 
disk I/O), thus enabling "main" thread to run scheduling even when disk is 
busy. However, if you running 32-bit Erlang, do not get carried away with 
number of threads, because you could easily run out of memory.

V.




More information about the erlang-questions mailing list