[erlang-questions] Why Beam.smp crashes when memory is over?

Mon Nov 9 14:11:34 CET 2009

It is still the same "let it crash" concept using the resource limit  
system I am designing.
But you can limit the crash in a more controlled way. Also you will be  
able to report
interesting information about what is crashing and when.

There is sometimes an issue when big systems crash. The restart may  
take a lot of time.
Nodes must be synchronised, database tables must be repaired etc etc.
I guess you can design this to be light and easy, but it is not always  
the case.

/Tony

On 9 nov 2009, at 12.43, Angel Alvarez wrote:

>
> Well please let me say something
>
> I'm plain new but some things are pretty clear for me.
>
>
> The beauty of the erlang concept is "let it crash" , "don't program  
> defensively"
> so the VM and the underlaying hardware are entities that can fail,  
> that's it.
>
> What's the problem so?
>
> Joe said...
>
> If you want failure tolerance you need at least two nodes...
>
> From J.A thesis
>   " ...Schneider [60, 59] answered this question by giving three  
> properties
> that he thought a hardware system should have in order to be  
> suitable for
> programming a fault-tolerant system. These properties Schneider  
> called:..."
>
> 1. Halt on failure — in the event of an error a processor should halt
>      instead of performing a possibly erroneous operation.2
>
> So on memory exhaustation the VM has to die and other node (erlang)  
> will do the recovery.
>
> that's the distrirbution role, no only to span computations over  
> several nodes to enhance performance
> but to provide resilence in the presence of fatal errors (non  
> correctable).
>
> As a OS process the VM has to compete with other OS processes so in  
> a shared deployment (a VM running
> on a server or a desktop) you cant be safe agaisnt a OOM trigered by  
> other entities.
>
> Such resource control thing wil only augment process overhead and  
> context switching in the VM.
>
> People new to erlang will be atracted to this hierarquical  
> decomposition of tasks as joe stated in his thesis
> "If you cant run certaing task try doing something simpler"
>
> Many languages and VM's are incorporating erlang's good "multicore"  
> features but not the erlang powerfull error handling concept
> and you guys want to kill the simpliticy incorporating many  
> defensive capabilities to avoid fatality instead of just organize  
> code to
> handle such fatality.
>
> ¿whats next?,  ¿A mailbox maximum message queue control?
>
>
> Well, that's all i have to say about that, Forrest Gump.
>
>
> El Lunes, 9 de Noviembre de 2009 09:45:10 Tony Rogvall escribió:
>> Interesting discussion!
>>
>> I have been working on a resource system for Erlang for nearly two
>> years now.
>> I have a working (tm) prototype where you can set resource limits  
>> like
>> max_processes/max_ports/max_memory/max_time/max_reductions ...
>> The limits are passed with spawn_opt and are inherited by the
>> processes spawned.
>> This means that if you spawn_opt(M,F,A[{max_memory, 1024*1024}]) the
>> process
>> will be able to use 1M words for it self and it's "subprocesses".  
>> This
>> also means
>> that the spawner will get 1M less to use (as designed right now).  
>> If a
>> resource limit
>> is reached the process crash with system_limt reason.
>>
>> There are still some details to work out before a release, but I will
>> try to get it ready before
>> end of this year.
>>
>> /Tony
>>
>>
>>
>> On 9 nov 2009, at 09.16, Robert Virding wrote:
>>
>>> No.
>>>
>>> There is a major difference between handling OOM in an OS and in the
>>> BEAM.
>>> In an OS it usually at a per process level that memory runs out so
>>> it is
>>> easy to decide which process to kill so that the OS can continue. In
>>> the
>>> BEAM, however, it is the VM as a whole which has run out of memory
>>> not a
>>> specific, it is. therefore, much more difficult to work out which
>>> process is
>>> the culprit and to decide what to do. For example it might be that  
>>> the
>>> process which causes the OOM is not the actual problem process, it
>>> might
>>> just the last straw. Or the actual cause may that the whole app
>>> might be
>>> generating large binaries too quickly. Or it might be that the whole
>>> app is
>>> spawning to many processes without any one process being the cause.
>>> Or ...
>>> In all these cases killing the process which triggered the OOM would
>>> be the
>>> Wrong Thing. We found that it was difficult to work out a reasonable
>>> strategy to handle the actual cause so we decided not handle it.
>>>
>>> "Don't catch an error which you can't handle" as the bard put it.
>>>
>>> Robert
>>>
>>> 2009/11/9 Max Lapshin <max.lapshin@REDACTED>
>>>
>>>> Yes, there are techniques to write watchdogs, but my question  
>>>> was: is
>>>> it possible to prevent Erlang VM from crash?
>>>>
>>>> ________________________________________________________________
>>>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>>>> erlang-questions (at) erlang.org
>>>>
>>>>
>>
>>
>> ________________________________________________________________
>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>> erlang-questions (at) erlang.org
>>
>>
>
>
> Este correo no tiene dibujos. Las formas extrañas en la pantalla son  
> letras.
> __________________________________________
>
> Clist UAH a.k.a Angel
> __________________________________________
> Warning: Microsoft_bribery.ISO contains OOXML code.
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>