[erlang-questions] Why Beam.smp crashes when memory is over?

Jayson Vantuyl kagato@REDACTED
Tue Nov 10 21:22:12 CET 2009


Dropping messages, suspending processes, and crashing processes is  
just a bad idea.

Erlang's messaging is not "left-guarded" in the sense described by  
Hoare.  That means that any behavior that suspends processes when the  
remote mailbox is full can either exhaust memory (i.e. what we have  
now) or arbitrarily deadlock the system.  The only requirement for a  
deadlock is a loop in messaging anywhere, and we have more than a few  
of those.

Dropping messages is probably the least disruptive, but (in  
applications that use OTP behaviors at least) it would just translate  
into gen_server:call timeouts, and we're back to defensive programming  
again.

I'm all for a heuristic that sends a signal to a program when it's  
getting close (i.e. memory allocation failed and "reserved" memory is  
starting to be consumed so trigger an alarm of some sort), but  
anything proposed so far compromises the current behavior of the VM in  
ways that are awfully unpredictable.

On Nov 10, 2009, at 4:45 AM, Joe Armstrong wrote:

> This is a very interesting problem. if processes have quotas, then how
> could you set the
> quota value?
>
> A perfectly correct process might just have a very deep stack, just  
> once in its
> life and otherwise be fine. Whether to crash this process or not would
> depend upon
> what the other processes in the system happened to be doing at the
> time. This would
> be very unfortunate - it's like your program being hit by a cosmic ray
> - nasty. It creates a random non-deterministic coupling between things
> that are supposed to be independent.
>
> A possibility that just occurred to me might be to suspend processes
> that appear to be
> running wild until such a time as the overall memory situation looks  
> good.
>
> Image two scheduler queues. One for well behaved programs. One for
> programs whose
> stacks and heaps are growing too fast. If memory is no problem we run
> programs from both queues. If memory is tight we run processes in the
> "problem" queue less often and with frequent garbs.
>
> Killing a program with a large stack and heap, just because their
> happens to be a
> temporary memory problem seems horrible, especially since the problem
> might go away
> if we wait a few milliseconds.
>
> Suspending a memory hungry process for a while, until memory is  
> available seems
> less objectionable. Perhapse it could be swapped out to disk and
> pulled in a lot later.
> Killing things at random in the hope it might help sounds like a
> really bad idea.
> Process migration could solve this - move it to a machine that has got
> more memory.
>
> Suspending things seems ok - you might even suspend an errant  
> process forever
> and reclaim the memory - but not kill it. Some other process could
> detect that the processes
> is not responding and kill it and thus all the semantics of the
> application would be obeyed
> (processes are allowed to be unresponsive, that's fine) and the  
> semantics of the
> error recovery should say what to do in this case.
>
> Just killing processes when they have done nothing wrong is not a  
> good idea.
>
> /Joe
>
>
>
> On Tue, Nov 10, 2009 at 1:10 PM, Ulf Wiger
> <ulf.wiger@REDACTED> wrote:
>> Richard O'Keefe wrote:
>>>>
>>>> One way would be to let the user set a memory quota on a process  
>>>> with
>>>> options at spawn time. When the process reaches it quota it can be
>>>> automatically killed or the user can
>>>> be notified in some way and take actions.
>>>
>>> One of the reasons this hasn't been done is, I presume, the fact  
>>> that
>>> it is quite difficult for a programmer to determine what the memory
>>> quota should be.  It depends on
>>>  ...
>>
>> I implemented resource limits in erlhive - at the Erlang level rather
>> than in the VM. The purpose was to be able to run foreign code safely
>> in a hosted environment. Eliminating the possibility to do damage
>> through traditional side-effects was relatively easy with a code
>> transform, but two ways of staging a DoS attack would be to gobble
>> RAM or CPU capacity. I approached this by inserting calls to a check
>> function that sampled heap size, and started a "watchdog" process  
>> that
>> would unceremoniously kill the program after a certain time.
>>
>> In short, I can see a need for such limits, and would like to include
>> a reduction ceiling. The limits could be set after careful testing
>> and high enough that they protect against runaway processes. A  
>> reduction
>> limit could be checked at the end of each slice, perhaps.
>>
>> In my experience, per-process memory usage is fairly predictable in
>> erlang. Does anyone have a different experience?
>>
>> BR,
>> Ulf W
>> --
>> Ulf Wiger
>> CTO, Erlang Training & Consulting Ltd
>> http://www.erlang-consulting.com
>>
>> ________________________________________________________________
>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>> erlang-questions (at) erlang.org
>>
>>
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>



More information about the erlang-questions mailing list