[erlang-questions] Why Beam.smp crashes when memory is over?

Angel Alvarez clist@REDACTED
Mon Nov 9 14:54:10 CET 2009


Well still there are many issues with this new approach

Where are the maibox of processes located? 

With a heap pre process...

 Couldnt you trigger a memory exception on a remote process by just sending one message
when the process is almost consuming its reserved memory?

Systems other than embedded erlangs deploy ( form de current erlang movement as a server/desktop plataform)
will suffer from resource contention beetween erlang VM and other OS processes. 

Port programs also need system resources...

Well in the end your approach is still very interesting as a framework for continous erlang VM innovations...

but please correct me if im wrong but I saw that memory carriers allowed to set several options on erlang VM start-up so, 

is stil posible to pacth those carriers to allow a safe memory reservation to let de VM manage properly a memory full 
condition by killing the offending process (sort of a OOM killer for the VM)?

Just telling the VM not to "kill system process" and let the supervisors do the work...

/Angel



El Lunes, 9 de Noviembre de 2009 Tony Rogvall escribió:
> 
> It is still the same "let it crash" concept using the resource limit  
> system I am designing.
> But you can limit the crash in a more controlled way. Also you will be  
> able to report
> interesting information about what is crashing and when.
> 
> There is sometimes an issue when big systems crash. The restart may  
> take a lot of time.
> Nodes must be synchronised, database tables must be repaired etc etc.
> I guess you can design this to be light and easy, but it is not always  
> the case.
> 
> /Tony
> 
> 
> 
> On 9 nov 2009, at 12.43, Angel Alvarez wrote:
> 
> >
> > Well please let me say something
> >
> > I'm plain new but some things are pretty clear for me.
> >
> >
> > The beauty of the erlang concept is "let it crash" , "don't program  
> > defensively"
> > so the VM and the underlaying hardware are entities that can fail,  
> > that's it.
> >
> > What's the problem so?
> >
> > Joe said...
> >
> > If you want failure tolerance you need at least two nodes...
> >
> > From J.A thesis
> >   " ...Schneider [60, 59] answered this question by giving three  
> > properties
> > that he thought a hardware system should have in order to be  
> > suitable for
> > programming a fault-tolerant system. These properties Schneider  
> > called:..."
> >
> > 1. Halt on failure — in the event of an error a processor should halt
> >      instead of performing a possibly erroneous operation.2
> >
> > So on memory exhaustation the VM has to die and other node (erlang)  
> > will do the recovery.
> >
> > that's the distrirbution role, no only to span computations over  
> > several nodes to enhance performance
> > but to provide resilence in the presence of fatal errors (non  
> > correctable).
> >
> > As a OS process the VM has to compete with other OS processes so in  
> > a shared deployment (a VM running
> > on a server or a desktop) you cant be safe agaisnt a OOM trigered by  
> > other entities.
> >
> > Such resource control thing wil only augment process overhead and  
> > context switching in the VM.
> >
> > People new to erlang will be atracted to this hierarquical  
> > decomposition of tasks as joe stated in his thesis
> > "If you cant run certaing task try doing something simpler"
> >
> > Many languages and VM's are incorporating erlang's good "multicore"  
> > features but not the erlang powerfull error handling concept
> > and you guys want to kill the simpliticy incorporating many  
> > defensive capabilities to avoid fatality instead of just organize  
> > code to
> > handle such fatality.
> >
> > ¿whats next?,  ¿A mailbox maximum message queue control?
> >
> >
> > Well, that's all i have to say about that, Forrest Gump.
> >
> >
> > El Lunes, 9 de Noviembre de 2009 09:45:10 Tony Rogvall escribió:
> >> Interesting discussion!
> >>
> >> I have been working on a resource system for Erlang for nearly two
> >> years now.
> >> I have a working (tm) prototype where you can set resource limits  
> >> like
> >> max_processes/max_ports/max_memory/max_time/max_reductions ...
> >> The limits are passed with spawn_opt and are inherited by the
> >> processes spawned.
> >> This means that if you spawn_opt(M,F,A[{max_memory, 1024*1024}]) the
> >> process
> >> will be able to use 1M words for it self and it's "subprocesses".  
> >> This
> >> also means
> >> that the spawner will get 1M less to use (as designed right now).  
> >> If a
> >> resource limit
> >> is reached the process crash with system_limt reason.
> >>
> >> There are still some details to work out before a release, but I will
> >> try to get it ready before
> >> end of this year.
> >>
> >> /Tony
> >>
> >>
> >>
> >> On 9 nov 2009, at 09.16, Robert Virding wrote:
> >>
> >>> No.
> >>>
> >>> There is a major difference between handling OOM in an OS and in the
> >>> BEAM.
> >>> In an OS it usually at a per process level that memory runs out so
> >>> it is
> >>> easy to decide which process to kill so that the OS can continue. In
> >>> the
> >>> BEAM, however, it is the VM as a whole which has run out of memory
> >>> not a
> >>> specific, it is. therefore, much more difficult to work out which
> >>> process is
> >>> the culprit and to decide what to do. For example it might be that  
> >>> the
> >>> process which causes the OOM is not the actual problem process, it
> >>> might
> >>> just the last straw. Or the actual cause may that the whole app
> >>> might be
> >>> generating large binaries too quickly. Or it might be that the whole
> >>> app is
> >>> spawning to many processes without any one process being the cause.
> >>> Or ...
> >>> In all these cases killing the process which triggered the OOM would
> >>> be the
> >>> Wrong Thing. We found that it was difficult to work out a reasonable
> >>> strategy to handle the actual cause so we decided not handle it.
> >>>
> >>> "Don't catch an error which you can't handle" as the bard put it.
> >>>
> >>> Robert
> >>>
> >>> 2009/11/9 Max Lapshin <max.lapshin@REDACTED>
> >>>
> >>>> Yes, there are techniques to write watchdogs, but my question  
> >>>> was: is
> >>>> it possible to prevent Erlang VM from crash?
> >>>>
> >>>> ________________________________________________________________
> >>>> erlang-questions mailing list. See http://www.erlang.org/faq.html
> >>>> erlang-questions (at) erlang.org
> >>>>
> >>>>
> >>
> >>
> >> ________________________________________________________________
> >> erlang-questions mailing list. See http://www.erlang.org/faq.html
> >> erlang-questions (at) erlang.org
> >>
> >>
> >
> >
> > Este correo no tiene dibujos. Las formas extrañas en la pantalla son  
> > letras.
> > __________________________________________
> >
> > Clist UAH a.k.a Angel
> > __________________________________________
> > Warning: Microsoft_bribery.ISO contains OOXML code.
> >
> > ________________________________________________________________
> > erlang-questions mailing list. See http://www.erlang.org/faq.html
> > erlang-questions (at) erlang.org
> >
> 
> 



-- 
No imprima este correo si no es necesario. El medio ambiente está en nuestras manos.
__________________________________________

Clist UAH a.k.a Angel
__________________________________________
China 'limpia' el Tibet para las Olimpiadas.


More information about the erlang-questions mailing list