[erlang-questions] Fail fast
Fri Dec 23 15:41:10 CET 2011
The ability to handle this is a "feature" that I think is missing from Erlang. The VM and the language is very stable, but the fact that a single, poorly behaving, process can cause the VM to die is pretty undesirable. I had a bug in a "logging" process where an ETS table wasn't getting purged properly. It grew and grew eventually bringing down the entire VM due to an OOM condition. This process wasn't significant to the operation of the system, (and if I wanted it to be I would've written a supervisor to manage it), yet it killed a critical service.
My personal wish would be the ability to optionally apply limits to a process when it is spawned (memory, ets table sizes, message queue would be a good start). When one or more of the limits are exceeded the process can be killed (and then trapped/supervised if needed). It would make the VM more stable, and would also assist in debugging (since it would be easy to see in the sasl logs what happened without needing to look at a crash dump). One
Date: Fri, 23 Dec 2011 07:44:49 +0000
Subject: Re: [erlang-questions] Fail fast
No, if BEAM cannot allocate more memory, the node just dies. In case you are interested, this handling of OOM condition has been discussed on the mailing list in the past. Supervision hierarchies don't help in this case.
On 23 December 2011 02:03, Jon Watte <jwatte@REDACTED> wrote:
If there was a proper supervision hierarchy all the way up to the "root" of the application, why would this happen? Wouldn't the supervisors just kill off whatever process ends up not being able to allocate memory, and thus clean up? (Perhaps kicking off users at the same time) If it fails far enough up, wouldn't it basically reset the erl environment to "scratch" ? Or would that be expecting too much from the supervision hierarchy?
Americans might object: there is no way we would sacrifice our living standards for the benefit of people in the rest of the world. Nevertheless, whether we get there willingly or not, we shall soon have lower consumption rates, because our present rates are unsustainable.
On Tue, Dec 20, 2011 at 6:23 PM, Chandru <chandrashekhar.mullaparthi@REDACTED> wrote:
I've just had a failure in one of my live services because an erlang node ran out of memory (caused by a traffic spike). Restart mechanisms exist to restart the node, but the node took a long time to die because it was writing a large erl_crash.dump file, and then there was a 7GB core dump.
Is there a quicker way to fail? I'm thinking of disabling core dumps entirely on the box. What else can I do? A configuration option on the node to only produce a summary erl_crash.dump would be nice. The most useful things for me in a crash dump usually are the slogan at the top, and the message queue lengths of each process. In this particular case, the slogan would've told me all that I needed to know.
erlang-questions mailing list
erlang-questions mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions