Broken gen:call/3?

Sean Hinde sean.hinde@REDACTED
Tue Nov 1 18:31:53 CET 2005


Interesting. The nature of this bug is such that it will only be  
exposed in unusual error cases, particularly ones that are difficult  
to create during testing

I am sure there are many "fully tested" systems which do not  
correctly account for this bug.

There is not even a realistic way currently to examine the exit  
reason to determine if the process has crashed (vs the caller  
crashing). It would actually be very difficult to write code which  
correctly works around this bug while still relying on links.

Perhaps the maintenance mode customers would be pleased if you made a  
change which correctly uncovered bugs in their systems?

Sean

On 1 Nov 2005, at 08:25, Raimo Niskanen wrote:

> In the best of worlds you would be right, but since this strange
> behaviour has been tested and in production for many many years
> you just _might_ not be right. And a behaviour change would
> expose new bugs.
>
> Therefore we assume a behaviour change would make our major
> paying customers, which are in the maintenance phase of their
> products, avoid taking a new OTP release; forcing us to maintain
> one release more than necessary, stealing recources from
> new development...
>
> sean.hinde@REDACTED (Sean Hinde) writes:
>
>
>> Indeed !
>>
>> I wonder how much code there is out there which is currently broken
>> because the author did not realise this happens vs code which would
>> be broken if it was changed.
>>
>> My guess, based on the assumption that people would expect to have to
>> handle 'EXIT' messages if they have chosen to link, is that this
>> behaviour hides many more latent bugs than would be introduced if it
>> were changed..
>>
>> Sean
>>
>> On 31 Oct 2005, at 14:18, Raimo Niskanen wrote:
>>
>>
>>> Aaah, well, yes.. This is an old flaw.
>>>
>>> Once upon a time there were only links to supervise other
>>> processes, so the only way to know if a server died during
>>> a library call e.g inside gen_server:call after sending
>>> the request while receiving the response, was that an
>>> 'EXIT' message was received instead; and then the library
>>> code for gen_server:call would have to trap exit messages
>>> and set a link to the server.
>>>
>>> But that can not be done by library code, since there can
>>> be only one link between any pair of processes. Possibly
>>> exit message trapping could be done, but there is a time
>>> window after receive before disabling exit message trapping
>>> that can not be controlled, so the library code can not
>>> be sure to not accidentally convert a link exit to an
>>> exit message.
>>>
>>> So, it was then designed so that _if_ the calling process
>>> had activated exit message trapping _and_ set a link to the
>>> server, then the gen_server:call could receive the 'EXIT'
>>> message and return an error code as a result of the server call.
>>>
>>> Later, when monitors was introduced we could not change
>>> the behaviour of gen_server:call to not consume 'EXIT'
>>> messages at all (which would be the right(TM) way, in
>>> the precence of monitors); the result would be passing
>>> undesired 'EXIT' messages onto old calling applications.
>>>
>>> So, there we are today. The calling process should check
>>> the result from gen_server:call plus receive 'EXIT' messages.
>>> Or set a monitor of its own.
>>>
>>> sean.hinde@REDACTED (Sean Hinde) writes:
>>>
>>>
>>>
>>>> Hi,
>>>>
>>>> This behaviour seems broken to me:
>>>>
>>>> 1. One process is linked to another (for supervision reasons),  
>>>> and a
>>>> gen_*:call/2 synchronous request is made from one to the other.
>>>>
>>>> 2. The called process crashes while handling the call.
>>>>
>>>> 3. gen:call consumes *both* it's own monitor 'DOWN' message  
>>>> *and* the
>>>> 'EXIT' message arising from the link
>>>>
>>>> Result: calling process doesn't get 'EXIT' message, and hence  
>>>> doesn't
>>>> know about the crash. It does not then function well as a
>>>> supervisor...
>>>>
>>>> Sean
>>>>
>>>>
>>>
>>> -- 
>>>
>>> / Raimo Niskanen, Erlang/OTP, Ericsson AB
>>>
>>>
>>
>>
>
> -- 
>
> / Raimo Niskanen, Erlang/OTP, Ericsson AB
>




More information about the erlang-questions mailing list