[erlang-questions] Programming question
Thu Jan 25 17:33:21 CET 2007
On 25 Jan 2007, at 12:35, Richard Carlsson wrote:
> Sean Hinde wrote:
>> The problem as I see it is that the calling process only
>> sometimes get its 'EXIT' message - it depends on context.
> If you have a library that uses the RPC model, as in the case of
> 'gen_server:call(...)', it is probably a bad idea to try to solve the
> problems with RPC that have been known for ages, such as "what do I do
> if the server goes down", by adding some ad-hoc handling code to every
> remote call. (It can and will be screwed up anyway.) I think that the
> interface should be used as it was intended (treating exceptions
> due to
> server-down as any other exception out from the call), and that
> additional supervision should be placed somewhere else, outside the
> main program logic.
> Sean is basically right here: he _ought_ to be able to use normal
> links for this purpose (after all, links are the central built-in
> "additional supervision" method in Erlang), regardless of whether
> the implementation of gen_server:call() does things with links and
> trapping of signals: that stuff should have been made transparent to
> the user, but is obviously not. (One problem is that there can only
> be a single link between two processes, so gen_server can't know
> whether or not it should re-issue the caught signal to the caller.)
Actually it can, because gen_server can rely solely on monitor for
its own purposes. If it gets an 'EXIT' message then it can be certain
that it is because the two processes have been explicitly linked.
I would be happy to have a compatibility mode for dealing with old
nodes, but I think the default behaviour should for gen_server to
selectively receive its own 'DOWN' message, and leave the EXIT
message on the queue.
> If this aspect of gen_server (and similar library functions) cannot
> be fixed, e.g. by using monitors instead of links, then at a
> minimum it
> should be documented that the functions will steal exit signals if you
> try to link directly to the server.
I agree with the documentation comment. It was extremely surprising
the first time I saw this behaviour. It resulted in several outages
of live systems where processes were not restarted simply because of
when they died (not code written by me, so at least two folks have
had this problem). There must be many other systems out there that
are just waiting to suffer the same fate.
> Meanwhile, the fix I suggested previously should work fine: use an
> intermediate process, whose signals the gen_server library does not
> interfere with.
To require a 3rd process between the two linked process just to
propogate the EXIT seems like extreme overkill. In my current
application the two process are dynamically created per call - this
would add a 50% overhead to every request.
I don't buy the backwards compatibility argument for this unintuitive
and IMO buggy behaviour. If we look at the cases:
1. Two processes are not linked.
Today - if the other process dies during the call then gen:call()
just throws an exception.
With my change - exactly the same
2. Two processes are linked, with the gen:call not wrapped in a catch
Today - if the other process dies during the call it throws an
exception and the local process dies
With my change - same result, the 'EXIT' message arrives later after
the caller died
3. Two processes are linked, gen:call wrapped in a catch, trapexit =
Today - if the other process dies during the call then an exception
is caught. There is no 'EXIT' message, even though this has to be
handled if the linked process dies at any other time.
With my change - The same exception is raised from the call, but the
existing 'EXIT' message handling will also be invoked. - To me this
is a pure bug fix
4. Two processes are linked, gen:call wrapped in a catch, trapexit =
Today - Exception is raised as normal, and the calling process lives on.
With proposed change: Exception is raised as normal but the calling
process is killed later by the 'EXIT' signal.
This last case could be seen as a backwards compatibility problem,
but given that the called process can potentially die at any time
outside the call, I would say that the gen_server behaviour is just
hiding a latent bug in the original code, which is likely to happen
at some point anyway.
Worst case we could have a separate call defined with the new behaviour
More information about the erlang-questions