[erlang-questions] Why Beam.smp crashes when memory is over?

Sun Nov 8 22:26:43 CET 2009

Erlang needs to allocate memory in any number of situations.  For  
example, assume that Erlang tried to tell your code.  Should it  
generate a message?  Should it call a function?  Should it create an  
exception?  Guess what all of these have in common?  They allocate  
memory (which isn't there).

You can try to work around this.  You can have reserved memory just  
for this.  However, there's still no clue on where it should happen.   
There is not a very good chance that this error will happen in the  
process that has all of the memory allocated.  If you take the Linux  
OOM approach, you would have to scan all of the processes, weigh them,  
mix in some randomness, and message it.  There's no memory for it do  
think much about the problem.  Even if you killed it, that would just  
trigger the supervisor to restart it, even though we may not have  
actually stopped the memory leak.

Worse, this means that an "out of memory error" can happen anywhere  
and must be handled everywhere, even the supervisors.  Patching the  
supervisors to reliably handle this would be insane.  Suddenly,  
reliability under load becomes impossible to guarantee.

Even if you emulate Linux and provide an OOM-killer (i.e. kill  
processes based on randomness + heuristics to detect runaway  
processes), you introduce tons of random behavior into the VM, when a  
VM restart would be recognizable, loggable, and generally easier to  
debug.

Exposing those errors creates an ugly situation.  This extra error  
handling would cause an explosion of corner cases, decreases in  
reliability, and volumes of code (i.e. where bugs live).  Inside of  
Erlang, the philosophy is to use supervisors and writing daemons to be  
able to recover from a restart.  Heartbeat gives the same behavior for  
the entire VM.  It's a philosophical design choice to try to handle  
critical faults rather than mask critical faults.  It's really better  
than trying to handle this.

It's seems obvious that there should be a better way to handle OOM,  
but it's is all devilishly difficult to do in any meaningfully  
portable (or useful) way.

On Nov 8, 2009, at 1:02 PM, Max Lapshin wrote:

> On Sun, Nov 8, 2009 at 11:57 PM, Jayson Vantuyl <kagato@REDACTED>  
> wrote:
>> From within Erlang, I don't believe so.
>
> And what are the problems? OS never crashes when memory is over, OOM
> killer does the job well.
> Why should die Erlang VM?
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>

-- 
Jayson Vantuyl
kagato@REDACTED