[erlang-questions] System limit bringing down rex and the VM

Igor Ribeiro Sucupira igorrs@REDACTED
Thu Sep 9 05:13:35 CEST 2010


I think you noticed I wasn't trying to justify the crash: I was just
being realistic and telling you to not rely on gracious failures.

I personally set limits in a way that, if the VM crashes because of
them, it's probably because something was already very wrong and my
application wouldn't be able to work properly anyway.

But I understand this is not acceptable in some scenarios. You might
want to limit the resources used by your application (for example,
because there are other applications running on the same server) and
keep it running with limited functionality if some limit is reached. I
remember having written at least one C++ application that keeps doing
what it can if it's not possible to spawn a thread.

Just telling you it might be an ambitious goal.

Igor.

On Wed, Sep 8, 2010 at 1:22 PM, Musumeci, Antonio S
<Antonio.Musumeci@REDACTED> wrote:
>
> OS limits? This is the BEAM process limit. Rex doesn't gracefully handle errors which in this case is triggered by mnesia_recover. I don't see the point in rex not handling it more gracefully either by
> returning the error to the caller or restarting. Rex may be a base process but it's not that important IMO to cause the VM to shutdown. I'd rather not have to implement OOM killer like behavior or some
> other watchdog process just so my VM doesn't crash due to hitting otherwise arbitrary max process limits.
>
> -----Original Message-----
> From: Igor Ribeiro Sucupira [mailto:igorrs@REDACTED]
> Sent: Wednesday, September 08, 2010 11:29 AM
> To: Musumeci, Antonio S (Enterprise Infrastructure)
> Cc: erlang-questions@REDACTED
> Subject: Re: [erlang-questions] System limit bringing down rex and the VM
>
> I know this is obvious, but the first step should be adjusting your OS limits so that your system, under expected conditions, never fails because of them.
> I have seen a couple of funny behaviours when exhausting system limits in the QA environment, so I wouldn't count with gracious failures from Erlang/Mnesia/etc.
>
> Good luck.
> Igor.
>
> On Wed, Sep 8, 2010 at 10:35 AM, Musumeci, Antonio S <Antonio.Musumeci@REDACTED> wrote:
> > Is this an expected behavior? Shouldn't there be a way to more graciously handle this type of thing?
> >
> > =ERROR REPORT==== 8-Sep-2010::09:28:26 ===
> > ** Generic server rex terminating
> > ** Last message in was {call,mnesia_lib,get_node_number,[],<0.55.0>}
> > ** When Server state == {0,nil}
> > ** Reason for termination ==
> > ** {system_limit,[{erlang,spawn_opt,
> >                          [{erlang,apply,
> >                                   [#Fun<rpc.1.93176247>,[]],
> >                                   [monitor]}]},
> >                  {erlang,spawn_monitor,1},
> >                  {rpc,handle_call_call,6},
> >                  {gen_server,handle_msg,5},
> >                  {proc_lib,init_p_do_apply,3}]}
> >
> > =CRASH REPORT==== 8-Sep-2010::09:28:26 ===
> >  crasher:
> >    initial call: rpc:init/1
> >    pid: <0.12.0>
> >    registered_name: rex
> >    exception exit: {system_limit,
> >                        [{erlang,spawn_opt,
> >                             [{erlang,apply,
> >                                  [#Fun<rpc.1.93176247>,[]],
> >                                  [monitor]}]},
> >                         {erlang,spawn_monitor,1},
> >                         {rpc,handle_call_call,6},
> >                         {gen_server,handle_msg,5},
> >                         {proc_lib,init_p_do_apply,3}]}
> >      in function  gen_server:terminate/6
> >    ancestors: [kernel_sup,<0.10.0>]
> >    messages: []
> >    links: [<0.11.0>]
> >    dictionary: []
> >    trap_exit: true
> >    status: running
> >    heap_size: 377
> >    stack_size: 24
> >    reductions: 143
> >  neighbours:
> >
> > =SUPERVISOR REPORT==== 8-Sep-2010::09:28:26 ===
> >     Supervisor: {local,kernel_sup}
> >     Context:    child_terminated
> >     Reason:     {system_limit,
> >                     [{erlang,spawn_opt,
> >                          [{erlang,apply,
> >                               [#Fun<rpc.1.93176247>,[]],
> >                               [monitor]}]},
> >                      {erlang,spawn_monitor,1},
> >                      {rpc,handle_call_call,6},
> >                      {gen_server,handle_msg,5},
> >                      {proc_lib,init_p_do_apply,3}]}
> >     Offender:   [{pid,<0.12.0>},
> >                  {name,rex},
> >                  {mfargs,{rpc,start_link,[]}},
> >                  {restart_type,permanent},
> >                  {shutdown,2000},
> >                  {child_type,worker}]
> >
> >
> > =SUPERVISOR REPORT==== 8-Sep-2010::09:28:26 ===
> >     Supervisor: {local,kernel_sup}
> >     Context:    shutdown
> >     Reason:     reached_max_restart_intensity
> >     Offender:   [{pid,<0.12.0>},
> >                  {name,rex},
> >                  {mfargs,{rpc,start_link,[]}},
> >                  {restart_type,permanent},
> >                  {shutdown,2000},
> >                  {child_type,worker}]
> >
> >
> > =ERROR REPORT==== 8-Sep-2010::09:28:26 ===
> > Mnesia(igo@REDACTED<mailto:igo@REDACTED>): ** ERROR ** mnesia_recover got
> > unexpected info: {'EXIT',
> >
> > <0.84.0>,
> >
> > shutdown}
>
> --
> "The secret of joy in work is contained in one word - excellence. To know how to do something well is to enjoy it." - Pearl S. Buck.


--
"The secret of joy in work is contained in one word - excellence. To
know how to do something well is to enjoy it." - Pearl S. Buck.



More information about the erlang-questions mailing list