[erlang-questions] Why Beam.smp crashes when memory is over?

Mon Nov 9 17:21:41 CET 2009

El Lunes, 9 de Noviembre de 2009 Tony Rogvall escribió:
> Hi!
> 
> On 9 nov 2009, at 14.54, Angel Alvarez wrote:
> 
> > Well still there are many issues with this new approach
> >
> Yes!  But it does not scare me ;-)
> 
> > Where are the maibox of processes located?
> >
> > With a heap pre process...
> 
> Depends on the implementation. But in general you could do something  
> like, If the the data is shared then
> you split the share (memory_size/ref_count). If the data is copied  
> then you must count it in.
> 
> >
> > Couldnt you trigger a memory exception on a remote process by just  
> > sending one message
> > when the process is almost consuming its reserved memory?
> >
> Yes. But that is the point. If it pass the limit the process will die.
> There are many special cases where you could think of using the memory  
> in a better and more optimal way.
> Lets say you are reaching the memory limit you may switch to a  
> compression algorithm for heap memory !?
> But lets keep it simple for the prototype, and see if it useful.
> 
> 
> > Systems other than embedded erlangs deploy ( form de current erlang  
> > movement as a server/desktop plataform)
> > will suffer from resource contention beetween erlang VM and other OS  
> > processes.
> >
> > Port programs also need system resources...
> >
> For loadable drivers using driver_alloc, one could possibly do  
> something, otherwise it will be
> up to the driver designer to handle it. There is a max_ports in the  
> prototype that limits number of
> open_ports. If sockets/files are mapped to single ports then it may  
> help a bit.
> 
> 
> > Well in the end your approach is still very interesting as a  
> > framework for continous erlang VM innovations...
> >
> Thanks.
> 
> > but please correct me if im wrong but I saw that memory carriers  
> > allowed to set several options on erlang VM start-up so,
> >
> I am not sure what you mean here?

Well some bits can be controlled as you show on spawn_opt. In the other hand i mean that VM Safe memory
as stated by others (Andrew, Tom ...) should be controlled on memory carriers: ( erts_alloc(3) )

In esence

- A new emergency memory alocator in the alloc_util framework just in case the VM need memory to recover from a OOM.

or perhaps

<WARNING speculative mode ON>

- Using the segment allocator on mmap supported arquitectures to allow fast recovery for the full erlang vm (sort of checkpointing)
using a special BIF you could instruct memory carriers to checkpoint the entire VM usining this allocator just in case the VM crashes.
so on next run the enrire VM can be recovered parsing the last checkpint

</WARNING>

Still i think is better and more clean having two o more instances that bulletprooffing the VM

/Angel 

> 
> /Tony
> 
> > is stil posible to pacth those carriers to allow a safe memory  
> > reservation to let de VM manage properly a memory full
> > condition by killing the offending process (sort of a OOM killer for  
> > the VM)?
> >
> > Just telling the VM not to "kill system process" and let the  
> > supervisors do the work...
> >
> 
> 
> 
> > /Angel
> >
> >
> >
> > El Lunes, 9 de Noviembre de 2009 Tony Rogvall escribió:
> >>
> >> It is still the same "let it crash" concept using the resource limit
> >> system I am designing.
> >> But you can limit the crash in a more controlled way. Also you will  
> >> be
> >> able to report
> >> interesting information about what is crashing and when.
> >>
> >> There is sometimes an issue when big systems crash. The restart may
> >> take a lot of time.
> >> Nodes must be synchronised, database tables must be repaired etc etc.
> >> I guess you can design this to be light and easy, but it is not  
> >> always
> >> the case.
> >>
> >> /Tony
> >>
> >>
> >>
> >> On 9 nov 2009, at 12.43, Angel Alvarez wrote:
> >>
> >>>
> >>> Well please let me say something
> >>>
> >>> I'm plain new but some things are pretty clear for me.
> >>>
> >>>
> >>> The beauty of the erlang concept is "let it crash" , "don't program
> >>> defensively"
> >>> so the VM and the underlaying hardware are entities that can fail,
> >>> that's it.
> >>>
> >>> What's the problem so?
> >>>
> >>> Joe said...
> >>>
> >>> If you want failure tolerance you need at least two nodes...
> >>>
> >>> From J.A thesis
> >>>  " ...Schneider [60, 59] answered this question by giving three
> >>> properties
> >>> that he thought a hardware system should have in order to be
> >>> suitable for
> >>> programming a fault-tolerant system. These properties Schneider
> >>> called:..."
> >>>
> >>> 1. Halt on failure — in the event of an error a processor should  
> >>> halt
> >>>     instead of performing a possibly erroneous operation.2
> >>>
> >>> So on memory exhaustation the VM has to die and other node (erlang)
> >>> will do the recovery.
> >>>
> >>> that's the distrirbution role, no only to span computations over
> >>> several nodes to enhance performance
> >>> but to provide resilence in the presence of fatal errors (non
> >>> correctable).
> >>>
> >>> As a OS process the VM has to compete with other OS processes so in
> >>> a shared deployment (a VM running
> >>> on a server or a desktop) you cant be safe agaisnt a OOM trigered by
> >>> other entities.
> >>>
> >>> Such resource control thing wil only augment process overhead and
> >>> context switching in the VM.
> >>>
> >>> People new to erlang will be atracted to this hierarquical
> >>> decomposition of tasks as joe stated in his thesis
> >>> "If you cant run certaing task try doing something simpler"
> >>>
> >>> Many languages and VM's are incorporating erlang's good "multicore"
> >>> features but not the erlang powerfull error handling concept
> >>> and you guys want to kill the simpliticy incorporating many
> >>> defensive capabilities to avoid fatality instead of just organize
> >>> code to
> >>> handle such fatality.
> >>>
> >>> ¿whats next?,  ¿A mailbox maximum message queue control?
> >>>
> >>>
> >>> Well, that's all i have to say about that, Forrest Gump.
> >>>
> >>>
> >>> El Lunes, 9 de Noviembre de 2009 09:45:10 Tony Rogvall escribió:
> >>>> Interesting discussion!
> >>>>
> >>>> I have been working on a resource system for Erlang for nearly two
> >>>> years now.
> >>>> I have a working (tm) prototype where you can set resource limits
> >>>> like
> >>>> max_processes/max_ports/max_memory/max_time/max_reductions ...
> >>>> The limits are passed with spawn_opt and are inherited by the
> >>>> processes spawned.
> >>>> This means that if you spawn_opt(M,F,A[{max_memory, 1024*1024}])  
> >>>> the
> >>>> process
> >>>> will be able to use 1M words for it self and it's "subprocesses".
> >>>> This
> >>>> also means
> >>>> that the spawner will get 1M less to use (as designed right now).
> >>>> If a
> >>>> resource limit
> >>>> is reached the process crash with system_limt reason.
> >>>>
> >>>> There are still some details to work out before a release, but I  
> >>>> will
> >>>> try to get it ready before
> >>>> end of this year.
> >>>>
> >>>> /Tony
> >>>>
> >>>>
> >>>>
> >>>> On 9 nov 2009, at 09.16, Robert Virding wrote:
> >>>>
> >>>>> No.
> >>>>>
> >>>>> There is a major difference between handling OOM in an OS and in  
> >>>>> the
> >>>>> BEAM.
> >>>>> In an OS it usually at a per process level that memory runs out so
> >>>>> it is
> >>>>> easy to decide which process to kill so that the OS can  
> >>>>> continue. In
> >>>>> the
> >>>>> BEAM, however, it is the VM as a whole which has run out of memory
> >>>>> not a
> >>>>> specific, it is. therefore, much more difficult to work out which
> >>>>> process is
> >>>>> the culprit and to decide what to do. For example it might be that
> >>>>> the
> >>>>> process which causes the OOM is not the actual problem process, it
> >>>>> might
> >>>>> just the last straw. Or the actual cause may that the whole app
> >>>>> might be
> >>>>> generating large binaries too quickly. Or it might be that the  
> >>>>> whole
> >>>>> app is
> >>>>> spawning to many processes without any one process being the  
> >>>>> cause.
> >>>>> Or ...
> >>>>> In all these cases killing the process which triggered the OOM  
> >>>>> would
> >>>>> be the
> >>>>> Wrong Thing. We found that it was difficult to work out a  
> >>>>> reasonable
> >>>>> strategy to handle the actual cause so we decided not handle it.
> >>>>>
> >>>>> "Don't catch an error which you can't handle" as the bard put it.
> >>>>>
> >>>>> Robert
> >>>>>
> >>>>> 2009/11/9 Max Lapshin <max.lapshin@REDACTED>
> >>>>>
> >>>>>> Yes, there are techniques to write watchdogs, but my question
> >>>>>> was: is
> >>>>>> it possible to prevent Erlang VM from crash?
> >>>>>>
> >>>>>> ________________________________________________________________
> >>>>>> erlang-questions mailing list. See http://www.erlang.org/faq.html
> >>>>>> erlang-questions (at) erlang.org
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>> ________________________________________________________________
> >>>> erlang-questions mailing list. See http://www.erlang.org/faq.html
> >>>> erlang-questions (at) erlang.org
> >>>>
> >>>>
> >>>
> >>>
> >>> Este correo no tiene dibujos. Las formas extrañas en la pantalla son
> >>> letras.
> >>> __________________________________________
> >>>
> >>> Clist UAH a.k.a Angel
> >>> __________________________________________
> >>> Warning: Microsoft_bribery.ISO contains OOXML code.
> >>>
> >>> ________________________________________________________________
> >>> erlang-questions mailing list. See http://www.erlang.org/faq.html
> >>> erlang-questions (at) erlang.org
> >>>
> >>
> >>
> >
> >
> >
> > -- 
> > No imprima este correo si no es necesario. El medio ambiente está en  
> > nuestras manos.
> > __________________________________________
> >
> > Clist UAH a.k.a Angel
> > __________________________________________
> > China 'limpia' el Tibet para las Olimpiadas.
> 
> 

-- 
Este correo no tiene dibujos. Las formas extrañas en la pantalla son letras.
__________________________________________

Clist UAH a.k.a Angel
__________________________________________
Artista -- (internet) --> Usuario final. Así los artistas cobran más y dicen menos paridas sobre lo que creen que es la piratería.