[erlang-questions] Programming question

Thu Jan 25 18:23:32 CET 2007

> To require a 3rd process between the two linked process just 
> to propogate the EXIT seems like extreme overkill. In my 
> current application the two process are dynamically created 
> per call - this would add a 50% overhead to every request.

To be fair, this would be 50% overhead on the cost of 
spawning processes, which is about 5-10 us/spawn.
It is hardly noticeable in most cases.

I agree that gen_server shouldn't touch the EXIT 
messages. While changing this might cause some 
consternation among a few users with legacy code, 
I believe the current behaviour violates the 
principle of least astonishment and, as you've
described, can lead to highly unexpected timing bugs.

gen:call() doesn't enable exit trapping under any 
circumstances (nor does it link to the server), 
so if the process is trapping exits, and is linked
to the server, it is certainly because someone else 
made it so. It is then reasonable to expect that 
some other code expects an EXIT message to arrive.

One of the main reasons why monitors were added in
the first place was the problem that link/unlink
and EXIT message handling can't be handled locally,
since no matter how many times you call link(), 
there will only be one link, and the first call
to unlink removes that link. Furthermore, there 
is only one EXIT message (except if link(DeadPid)
has been called one or more times), and the 
handling of that message must be a process-global
matter.

BR,
Ulf W

> -----Original Message-----
> From: erlang-questions-bounces@REDACTED 
> [mailto:erlang-questions-bounces@REDACTED] On Behalf Of Sean Hinde
> Sent: den 25 januari 2007 17:33
> To: Erlang Questions
> Subject: Re: [erlang-questions] Programming question
> 
> 
> On 25 Jan 2007, at 12:35, Richard Carlsson wrote:
> 
> > Sean Hinde wrote:
> >>
> >> The problem as I see it is that the calling process only 
> sometimes  
> >> get its 'EXIT' message - it depends on context.
> >
> > If you have a library that uses the RPC model, as in the case of 
> > 'gen_server:call(...)', it is probably a bad idea to try to 
> solve the 
> > problems with RPC that have been known for ages, such as 
> "what do I do 
> > if the server goes down", by adding some ad-hoc handling 
> code to every 
> > remote call. (It can and will be screwed up anyway.) I 
> think that the 
> > interface should be used as it was intended (treating 
> exceptions due 
> > to server-down as any other exception out from the call), and that 
> > additional supervision should be placed somewhere else, outside the 
> > main program logic.
> >
> > Sean is basically right here: he _ought_ to be able to use normal 
> > links for this purpose (after all, links are the central built-in 
> > "additional supervision" method in Erlang), regardless of 
> whether the 
> > implementation of gen_server:call() does things with links and 
> > trapping of signals: that stuff should have been made 
> transparent to 
> > the user, but is obviously not. (One problem is that there 
> can only be 
> > a single link between two processes, so gen_server can't 
> know whether 
> > or not it should re-issue the caught signal to the caller.)
> 
> Actually it can, because gen_server can rely solely on 
> monitor for its own purposes. If it gets an 'EXIT' message 
> then it can be certain that it is because the two processes 
> have been explicitly linked.
> 
> I would be happy to have a compatibility mode for dealing 
> with old nodes, but I think the default behaviour should for 
> gen_server to selectively receive its own 'DOWN' message, and 
> leave the EXIT message on the queue.
> 
> 
> >
> > If this aspect of gen_server (and similar library 
> functions) cannot be 
> > fixed, e.g. by using monitors instead of links, then at a 
> minimum it 
> > should be documented that the functions will steal exit 
> signals if you 
> > try to link directly to the server.
> 
> I agree with the documentation comment. It was extremely 
> surprising the first time I saw this behaviour. It resulted 
> in several outages of live systems where processes were not 
> restarted simply because of when they died (not code written 
> by me, so at least two folks have had this problem). There 
> must be many other systems out there that are just waiting to 
> suffer the same fate.
> 
> >
> > Meanwhile, the fix I suggested previously should work fine: use an 
> > intermediate process, whose signals the gen_server library does not 
> > interfere with.
> 
> To require a 3rd process between the two linked process just 
> to propogate the EXIT seems like extreme overkill. In my 
> current application the two process are dynamically created 
> per call - this would add a 50% overhead to every request.
> 
> I don't buy the backwards compatibility argument for this 
> unintuitive and IMO buggy behaviour. If we look at the cases:
> 
> 1. Two processes are not linked.
> 
> Today - if the other process dies during the call then 
> gen:call() just throws an exception.
> With my change - exactly the same
> 
> 2. Two processes are linked, with the gen:call not wrapped in a catch
> 
> Today - if the other process dies during the call it throws 
> an exception and the local process dies With my change - same 
> result, the 'EXIT' message arrives later after the caller died
> 
> 3. Two processes are linked, gen:call wrapped in a catch, 
> trapexit = true
> 
> Today - if the other process dies during the call then an 
> exception is caught. There is no 'EXIT' message, even though 
> this has to be handled if the linked process dies at any other time.
> With my change - The same exception is raised from the call, 
> but the existing 'EXIT' message handling will also be 
> invoked. - To me this is a pure bug fix
> 
> 4. Two processes are linked, gen:call wrapped in a catch, 
> trapexit = false
> 
> Today - Exception is raised as normal, and the calling 
> process lives on.
> With proposed change: Exception is raised as normal but the 
> calling process is killed later by the 'EXIT' signal.
> 
> This last case could be seen as a backwards compatibility 
> problem, but given that the called process can potentially 
> die at any time outside the call, I would say that the 
> gen_server behaviour is just hiding a latent bug in the 
> original code, which is likely to happen at some point anyway.
> 
> Worst case we could have a separate call defined with the new 
> behaviour
> 
> Sean
> 
> 
> 
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>