[erlang-questions] Fail fast

Fri Dec 23 15:44:58 CET 2011

Email got sent too soon.....
The ability to handle this is a "feature" that I think is missing from Erlang. The VM and the language is very stable, but the fact that a single, poorly behaving, process can cause the VM to die is pretty undesirable. I had a bug in a "logging" process where an ETS table wasn't getting purged properly. It grew and grew eventually bringing down the entire VM due to an OOM condition. This process wasn't significant to the operation of the system, (and if I wanted it to be I would've written a supervisor to manage it), yet it killed a critical service.

My personal wish would be the ability to optionally apply limits to a process when it is spawned (memory, ets table sizes, message queue would be a good start). When one or more of the limits are exceeded the process can be killed (and then trapped/supervised if needed). It would make the VM more stable, and would also assist in debugging (since it would be easy to see in the sasl logs what happened without needing to look at a crash dump). One other advantage of this is the ability to assist in testing, having the limits set temporarily to find possible memory hogs and issues with head of line blocking (message queues growing too much). Those limits would be removed for production.
An option like that would, IMO, be a useful addition to the language.
Cheers
Matt
From: chandrashekhar.mullaparthi@REDACTED
Date: Fri, 23 Dec 2011 07:44:49 +0000
To: jwatte@REDACTED
CC: erlang-questions@REDACTED
Subject: Re: [erlang-questions] Fail fast

No, if BEAM cannot allocate more memory, the node just dies. In case you are interested, this handling of OOM condition has been discussed on the mailing list in the past. Supervision hierarchies don't help in this case.

cheers
Chandru

On 23 December 2011 02:03, Jon Watte <jwatte@REDACTED> wrote:

If there was a proper supervision hierarchy all the way up to the "root" of the application, why would this happen? Wouldn't the supervisors just kill off whatever process ends up not being able to allocate memory, and thus clean up? (Perhaps kicking off users at the same time) If it fails far enough up, wouldn't it basically reset the erl environment to "scratch" ? Or would that be expecting too much  from the supervision hierarchy?

Sincerely,
jw
--
Americans might object: there is no way we would sacrifice our living standards for the benefit of people in the rest of the world. Nevertheless, whether we get there willingly or not, we shall soon have lower consumption rates, because our present rates are unsustainable. 

On Tue, Dec 20, 2011 at 6:23 PM, Chandru <chandrashekhar.mullaparthi@REDACTED> wrote:

Hello everyone,

I've just had a failure in one of my live services because an erlang node ran out of memory (caused by a traffic spike). Restart mechanisms exist to restart the node, but the node took a long time to die because it was writing a large erl_crash.dump file, and then there was a 7GB core dump.

Is there a quicker way to fail? I'm thinking of disabling core dumps entirely on the box. What else can I do? A configuration option on the node to only produce a summary erl_crash.dump would be nice. The most useful things for me in a crash dump usually are the slogan at the top, and the message queue lengths of each process. In this particular case, the slogan would've told me all that I needed to know.

Chandru

_______________________________________________

erlang-questions mailing list

erlang-questions@REDACTED

http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
erlang-questions@REDACTED
http://erlang.org/mailman/listinfo/erlang-questions 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20111223/b9138c92/attachment.htm>