[erlang-questions] System limit bringing down rex and the VM

Thu Sep 9 15:10:17 CEST 2010

I did read it... It's just not relevant. It doesn't matter if it's my application or not. I have no control over the core system. It's not possible to do anything but mitigate the problem by trying not to spawn processes above some totally arbitrary value. If I only had used 80% of the limit rex could suddenly get tons of incoming calls and spawn the other 20% till it crashed.

It's a design flaw to crash a system due to a recoverable error. Rex doesn't crash when a remote node is down so why should it do so when it can't fulfill the rpc request because of other issues?

-----Original Message-----
From: Paul Fisher [mailto:pfisher@REDACTED] 
Sent: Thursday, September 09, 2010 8:58 AM
To: <bile@REDACTED>
Cc: Chandru; Musumeci, Antonio S (Enterprise Infrastructure); erlang-questions@REDACTED
Subject: Re: [erlang-questions] System limit bringing down rex and the VM

And you apparently failed to read any of the rest of what I wrote.  The root cause of the problem is that your application is using > 99% of the available processes without any consideration for limit.  This is a design flaw in your application.

On Sep 9, 2010, at 5:13 AM, <bile@REDACTED> wrote:

> On Wed, 8 Sep 2010 22:42:36 -0500
> Paul Fisher <pfisher@REDACTED> wrote:
>> 
>> There is no sane thing that the vm could do in the face of such 
>> process usage.  Erlang crashes so that you can figure out how to 
>> correct the design flaw, by looking at the dump.
>> 
> 
> That's simply false. The sane thing to do is to return an error so the 
> developer may attempt to handle the situation. There is no flaw.
> Hitting the max process limit is not a guaranteed fatal situation and 
> I should not be kept from attempting to respond it it.
> 
> The fact that this happens by chance due to rex and the supervisor's 
> setup rather than built into the VM totally blows your argument out of 
> the water. Erlang doesn't crash in the normal sense.
> It's not a true fatal error. It could be triggered by calling 
> rpc:stop(). If it were an actual design decision it should be clearly 
> documented and part of the base VM behavior... not a side effect of 
> rex's design.

--
paul