[erlang-questions] Fail fast

Matthew Evans mattevans123@REDACTED
Fri Dec 23 19:37:25 CET 2011

I might take a look at that.

Subject: Re: [erlang-questions] Fail fast
From: tony@REDACTED
Date: Fri, 23 Dec 2011 18:32:32 +0100
CC: mattevans123@REDACTED; erlang-questions@REDACTED
To: chandrashekhar.mullaparthi@REDACTED

If you are really, really interested in these kind of thing then maybe you could have a look in github.com/tonyrog/otp in the branch limits.
It basically implements som new options to spawn_opt to set up resource limits like:
limits are inherited by spawned process and some of the resources are shared among the spawn processes to form a kind of resource group.
The implementation is experimental and was made in order to demonstrate the features for the OTP group. Time for an EEP?
Flavors of max_message_queue_len could be let sender crash, receiver crash, sender block(not so erlangish: drop head of message queue , drop tail of message queue )


On 23 dec 2011, at 18:04, Chandru wrote:Yes, I agree. I asked for "bounded message queues" a long time ago, but I got voted down :(



On 23 December 2011 14:44, Matthew Evans <mattevans123@REDACTED> wrote:

Email got sent too soon.....
The ability to handle this is a "feature" that I think is missing from Erlang. The VM and the language is very stable, but the fact that a single, poorly behaving, process can cause the VM to die is pretty undesirable. I had a bug in a "logging" process where an ETS table wasn't getting purged properly. It grew and grew eventually bringing down the entire VM due to an OOM condition. This process wasn't significant to the operation of the system, (and if I wanted it to be I would've written a supervisor to manage it), yet it killed a critical service.

My personal wish would be the ability to optionally apply limits to a process when it is spawned (memory, ets table sizes, message queue would be a good start). When one or more of the limits are exceeded the process can be killed (and then trapped/supervised if needed). It would make the VM more stable, and would also assist in debugging (since it would be easy to see in the sasl logs what happened without needing to look at a crash dump). One other advantage of this is the ability to assist in testing, having the limits set temporarily to find possible memory hogs and issues with head of line blocking (message queues growing too much). Those limits would be removed for production.
An option like that would, IMO, be a useful addition to the language.
From: chandrashekhar.mullaparthi@REDACTED

Date: Fri, 23 Dec 2011 07:44:49 +0000
To: jwatte@REDACTED
CC: erlang-questions@REDACTED
Subject: Re: [erlang-questions] Fail fast

No, if BEAM cannot allocate more memory, the node just dies. In case you are interested, this handling of OOM condition has been discussed on the mailing list in the past. Supervision hierarchies don't help in this case.


On 23 December 2011 02:03, Jon Watte <jwatte@REDACTED> wrote:

If there was a proper supervision hierarchy all the way up to the "root" of the application, why would this happen? Wouldn't the supervisors just kill off whatever process ends up not being able to allocate memory, and thus clean up? (Perhaps kicking off users at the same time) If it fails far enough up, wouldn't it basically reset the erl environment to "scratch" ? Or would that be expecting too much  from the supervision hierarchy?

Americans might object: there is no way we would sacrifice our living standards for the benefit of people in the rest of the world. Nevertheless, whether we get there willingly or not, we shall soon have lower consumption rates, because our present rates are unsustainable. 

On Tue, Dec 20, 2011 at 6:23 PM, Chandru <chandrashekhar.mullaparthi@REDACTED> wrote:

Hello everyone,

I've just had a failure in one of my live services because an erlang node ran out of memory (caused by a traffic spike). Restart mechanisms exist to restart the node, but the node took a long time to die because it was writing a large erl_crash.dump file, and then there was a 7GB core dump.

Is there a quicker way to fail? I'm thinking of disabling core dumps entirely on the box. What else can I do? A configuration option on the node to only produce a summary erl_crash.dump would be nice. The most useful things for me in a crash dump usually are the slogan at the top, and the message queue lengths of each process. In this particular case, the slogan would've told me all that I needed to know.



erlang-questions mailing list



erlang-questions mailing list

erlang-questions mailing list

"Installing applications can lead to corruption over time. Applications gradually write over each other's libraries, partial upgrades occur, user and system errors happen, and minute changes may be unnoticeable and difficult to fix"

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20111223/d722f556/attachment.htm>

More information about the erlang-questions mailing list