[erlang-questions] On selective receive (Re: eep: multiple patterns)

Tue Jun 3 14:49:36 CEST 2008

2008/6/3 Sean Hinde <sean.hinde@REDACTED>:

>
> On 3 Jun 2008, at 12:30, Chandru wrote:
>
>
>> We already have a mechanism to restart if a queue grows too large
>> (actually 2 - process_info monitoring, and out of memory !)
>>
>>
>> I agree it is nearly impossible to predict this -  but what options does a
>> programmer have without this bounded queue facility.
>>
>
> Well, I guess, mostly you need to have a design that doesn't lead to
> massive queue build up under sustained overload :-). This might mean input
> load regulation, or tweaking the process structure (the logger process
> problem).

Ofcourse :-) But  as you say, sometimes it is hard to predict it so the
design probably didn't cater for it.

> The system is unlikely to be performing to spec during this whole period of
> queue build up followed by cyclic restart - it doesn't really matter if the
> system restarts because it runs out of memory or cyclic restarts one process
> inside. It is still an outage for customers of the system.
>
> All you need to know is that it has crashed and why, so you can fix the
> bug. The erl_crash dump will tell you about the huge message queue.

I have seen erlang nodes die a few times without producing an
erl_crash.dump. Sometimes it is because Ops got impatient and brutally
killed all erlang related processes. Even if you did allow the system to run
out of memory, for a system with a lot of memory, it will take a long time.
All the while, the system will not be responding as it should be.

1. Introduce message queue monitoring for every process which is potentially
> long lived, which imho is extra boiler plate code which reduces readability
> of core functionality. Also there will be different ways of doing it
> depending on how your process is structured (gen_fsm, gen_server, gen_event,
> pure erlang...). If all that one does upon detecting this condition is clear
> the message queue by discarding messages, or terminate the process, wouldn't
> it be good to have this built-in?
>
> Another option - fix the system so that it doesn't get into that state.

I'm all for fixing the system - all I'm asking for is facilities to detect
this with less pain.

>>  3. Wait for the system to crash in live and then figure out what
>> happened.
>>
>  2. have another process which monitors the entire system - which is not
> very scalable when you have hundreds of thousands of processes.
>
> Exactly. It is a bad bug that leads to such queue build up. Crashing is
> fine in this case, and probably preferable to lingering onwards silently
> failing to provide service.

Exactly my point. I guess we both agree that it should crash. The
disagreement seems to be about *when and how* it should crash.I would prefer
that the process in question crash because in all probability, it's callers
have timedout and not expecting a response any way.

cheers
Chandru
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080603/7a22fec1/attachment.htm>