[erlang-questions] Why Beam.smp crashes when memory is over?

Wed Nov 11 00:13:01 CET 2009

On Nov 11, 2009, at 1:45 AM, Joe Armstrong wrote:

> A perfectly correct process might just have a very deep stack, just  
> once in its
> life and otherwise be fine. Whether to crash this process or not would
> depend upon
> what the other processes in the system happened to be doing at the
> time.

Surely with per-process quotas the process would be crashed on exceeding
its limit no matter what any other processes might be doing?

I may be misunderstanding you.  One thing that has plagued the UNIX
world for years is over-commitment.  A bunch of UNIX processes ask
for lots of virtual memory each, and the operating system says yes
to all of them, but doesn't actually give them the memory.  When
they try to touch the memory, it's then that the operating system
actually allocates the pages.  And it's then that the operating
system discovers that "oops, each process is within its quota, but
I don't actually have enough memory to go around."  And it kills
processes until it does.  Quite often, the wrong ones.

Some UNIX programmers (me amongst others) say this is ridiculous,
a well written process won't ask for memory if it doesn't think it's
going to need it, and if it can't have it, it should be told right
now.  Others say, no, it's normal practice to reserve a lot more than
you are really likely to need, overcommitment is essential support.

So if a process usually needs 10K but *might* need 10M, and there
are process quotas, it has to ask for all 10M.  And either the VM
refuses to create new processes when the sum of the quotas exceeds
available memory, in which case most of the memory might be lying
idle because it was worst case requests, or the VM allows new
processes to be created even so, in which case you can end up with
a bunch of processes all within their quotas, but the VM runs out
of memory anyway.

Erlang already has 'hibernate', where a process can allow the VM
to reclaim most of its memory.  This suggests something that does
the opposite.  If a process can *tell* when it's going to need a
lot more memory than usual, there could be an operation that
says "please increase my quota to B bytes + W words, and if you
can't do that just now, suspend me until you can".

> A possibility that just occurred to me might be to suspend processes
> that appear to be
> running wild until such a time as the overall memory situation looks  
> good.

A computation might use a lot of memory by setting up one large
process.

A computation might use a lot of memory by setting up a large
number of small processes.

The VM could be running out of memory, and there could be a large
process, but _that_ process might be completely innocent.
200 (small) sheep weigh more than one (large) elephant and they
breed a lot faster.

A process might not be growing its own stack or heap at all, but
none the less might be (indirectly) responsible for increasing
demands on memory.

Someone has already proposed a system whereby a process has to
share its memory quota with the processes it spawns.

By the way, I note that Java has precisely the same kind of
problem.  You can create a thread with a bound on its stack
size, but the documentation for the relevant Thread constructor
says:
	Allocates a new Thread object so that it ...
	has the specified stack size.  ...
	The stack size is the approximate number of bytes
	of address space that the virtual machine is to
	allocate for this thread's stack.
	The effect of the stackSize parameter, if any,
	is highly platform dependent.
[That sentence is bold in the original.]
	On some platforms, specifying a higher value
	for the stackSize parameter may allow a thread
	to achieve greater recursion depth before throwing
	a StackOverflowError.  Similarly, specifying a
	lower value may allow a greater number of threads
	to exist concurrently without throwing an
	OutOfMemoryError (or other internal error).
	The details of the relationship between the value
	of the stackSize parameter and the maximum
	recursion depth and concurrency level are
	platform-dependent. On some platforms, the value
	of the stackSize parameter may have no effect whatsoever.
[That sentence is bold in the original.]

What happens when Java runs out of memory because the heap fills
up?  It's not completely clear from the documentation, but it
appears that whichever process was running when an allocation
attempt fails gets an OutOfMemory exception, even if that's the
smallest process in the whole system.  This exception doesn't
seem to be handled in very many Java programs, at any rate I've
often found perfectly good Java programs to crash with an
unhandled OutOfMemory exception on my 4GB laptop because the
default is 64MB.

So, the Erlang VM crashes when memory runs out?
That is, in practice, exactly what happens in Java.
Can it be that Java programmers just don't expect the
same quality of service from VMs as Erlang programmers do?
(:-) (:-) (:-)