[erlang-questions] System limit bringing down rex and the VM

Musumeci, Antonio S Antonio.Musumeci@REDACTED
Wed Sep 8 19:46:36 CEST 2010

I'm sorry but that's ridiculous. Limits exist to *limit* not to crash. I see no reasons that rex should cause the VM to fail because it can't spawn a child to handle the rpc request. When a node is down it returns {badrpc,nodedown}. If a process can't be spawned shouldn't it return a similar error? At least be configured to not have the supervisor exit and therefore bring down the entire system? 

[1] That's not what is being argued. Obviously spawn will fail if there are too many processes. The problem is that rex exits when it happens which causes the entire system to fail. Does any modern OS suddenly reboot when the same type of limit is reached? Do they shut down when virtual memory is exhausted? No, they return errors and allow the developer to handle the issue as well as the situation allows.

[2] The Linux kernel for example keeps a certain amount of memory for itself. Besides, just because you can't acquire some resource at T0 does not mean 1) you need the same resource to handle the error and/or 2) that it won't be available at time T1.

Increasing the process count isn't a solution. It's a hack that pushes the problem out further.

-----Original Message-----
From: Jachym Holecek [mailto:freza@REDACTED] 
Sent: Wednesday, September 08, 2010 1:08 PM
To: Musumeci, Antonio S (Enterprise Infrastructure)
Cc: Igor Ribeiro Sucupira; erlang-questions@REDACTED
Subject: Re: [erlang-questions] System limit bringing down rex and the VM

# Musumeci, Antonio S 2010-09-08:
> OS limits? This is the BEAM process limit. [...]

Which you can adjust with '+P <number>' emulator flag to a sufficiently large value and be done with it, in the spirit of what Igor suggests.

You're supposed to provide the runtime system with enough resources (be it OS or emulator settings) to handle the expected load. Dealing with this kind of errors "more gracefully" would be too much pain[1] or simply impossible[2], AFAIU.

	-- Jachym

[1] Process table limit -- every single call to any variant of spawn() may fail
    this way. There's plenty of those. There's a similar limit on # of ETS tabs.

[2] Out of memory -- if there's no memory, where do you get the memory to deal
    with OOM error?

More information about the erlang-questions mailing list