[erlang-questions] Programming question

Thu Jan 25 17:33:21 CET 2007

On 25 Jan 2007, at 12:35, Richard Carlsson wrote:

> Sean Hinde wrote:
>>
>> The problem as I see it is that the calling process only  
>> sometimes  get its 'EXIT' message - it depends on context.
>
> If you have a library that uses the RPC model, as in the case of
> 'gen_server:call(...)', it is probably a bad idea to try to solve the
> problems with RPC that have been known for ages, such as "what do I do
> if the server goes down", by adding some ad-hoc handling code to every
> remote call. (It can and will be screwed up anyway.) I think that the
> interface should be used as it was intended (treating exceptions  
> due to
> server-down as any other exception out from the call), and that
> additional supervision should be placed somewhere else, outside the
> main program logic.
>
> Sean is basically right here: he _ought_ to be able to use normal
> links for this purpose (after all, links are the central built-in
> "additional supervision" method in Erlang), regardless of whether
> the implementation of gen_server:call() does things with links and
> trapping of signals: that stuff should have been made transparent to
> the user, but is obviously not. (One problem is that there can only
> be a single link between two processes, so gen_server can't know
> whether or not it should re-issue the caught signal to the caller.)

Actually it can, because gen_server can rely solely on monitor for  
its own purposes. If it gets an 'EXIT' message then it can be certain  
that it is because the two processes have been explicitly linked.

I would be happy to have a compatibility mode for dealing with old  
nodes, but I think the default behaviour should for gen_server to  
selectively receive its own 'DOWN' message, and leave the EXIT  
message on the queue.

>
> If this aspect of gen_server (and similar library functions) cannot
> be fixed, e.g. by using monitors instead of links, then at a  
> minimum it
> should be documented that the functions will steal exit signals if you
> try to link directly to the server.

I agree with the documentation comment. It was extremely surprising  
the first time I saw this behaviour. It resulted in several outages  
of live systems where processes were not restarted simply because of  
when they died (not code written by me, so at least two folks have  
had this problem). There must be many other systems out there that  
are just waiting to suffer the same fate.

>
> Meanwhile, the fix I suggested previously should work fine: use an
> intermediate process, whose signals the gen_server library does not
> interfere with.

To require a 3rd process between the two linked process just to  
propogate the EXIT seems like extreme overkill. In my current  
application the two process are dynamically created per call - this  
would add a 50% overhead to every request.

I don't buy the backwards compatibility argument for this unintuitive  
and IMO buggy behaviour. If we look at the cases:

1. Two processes are not linked.

Today - if the other process dies during the call then gen:call()  
just throws an exception.
With my change - exactly the same

2. Two processes are linked, with the gen:call not wrapped in a catch

Today - if the other process dies during the call it throws an  
exception and the local process dies
With my change - same result, the 'EXIT' message arrives later after  
the caller died

3. Two processes are linked, gen:call wrapped in a catch, trapexit =  
true

Today - if the other process dies during the call then an exception  
is caught. There is no 'EXIT' message, even though this has to be  
handled if the linked process dies at any other time.
With my change - The same exception is raised from the call, but the  
existing 'EXIT' message handling will also be invoked. - To me this  
is a pure bug fix

4. Two processes are linked, gen:call wrapped in a catch, trapexit =  
false

Today - Exception is raised as normal, and the calling process lives on.
With proposed change: Exception is raised as normal but the calling  
process is killed later by the 'EXIT' signal.

This last case could be seen as a backwards compatibility problem,  
but given that the called process can potentially die at any time  
outside the call, I would say that the gen_server behaviour is just  
hiding a latent bug in the original code, which is likely to happen  
at some point anyway.

Worst case we could have a separate call defined with the new behaviour

Sean