[erlang-questions] Why isn't 'EXIT' message being received/processed?

Youngkin, Rich richard.youngkin@REDACTED
Thu Feb 26 01:36:50 CET 2015


I have an explanation for why it appears as if an 'EXIT' message isn't
received and processed. It seems kind of obvious looking back, but also
subtle enough that I don't feel like a complete idiot :)  Hopefully this
may help someone else who trips over the problem. I'm also interested in
feedback, especially concerning some alternatives I describe below for
addressing it.

So to recap, I have a process that creates a process using spawn_link. This
parent process has a receive loop with a clause that matches 'EXIT'. The
problem I encountered was that the child's 'EXIT' messages weren't always
being received by the parent process before the parent process tried to
send a message to the linked child process. In this case the parent process
would fail with a "noproc" error (i.e., the child process no longer exists).

The cause appears to be the order of messages received in the parent's
mailbox. The 'EXIT' message is handled just like any other message in that
a process's mailbox behaves like a queue (FIFO). So the 'EXIT' message is
placed in the mailbox after any other unprocessed messages. Those messages
will be handled before the 'EXIT' message. In my case, one of the prior
messages results in sending a message (PID ! Message) to the failed child
process before the 'EXIT' message is seen, a simple race condition.  I have
seen test results that support this theory.  Am I correctly characterizing
the handling of 'EXIT' messages WRT general mailbox behavior?

Since this behavior is non-deterministic my simple approach of handling
'EXIT' messages won't work. Alternatives I'm considering include:
1. Selective receives, but I don't really see this as a robust approach to
handling this given the description of this approach in LYSE. LYSE also
describes a better approach using min_heap, but this seems like overkill in
my case.
2. Handling 'EXIT' messages and backing that up with a try/catch for the
race condition.
3. Forget linking altogether and just handle the problem with a try/catch
block

I'm leaning towards option 3 and I'm interested in other opinions/options.

Finally, this seems like a fairly common use case, especially in RabbitMQ
applications (where the recommendation is to monitor/link amqp_channel
processes). But maybe I'm missing something, or misusing 'EXIT'?  Any
comments?

Thanks,
Rich

On Tue, Feb 17, 2015 at 3:25 PM, Youngkin, Rich <
richard.youngkin@REDACTED> wrote:

> Hi,
>
> I've got an app that spawn_links processes with trap_exit. I'm killing the
> linked processes but the monitoring process isn't always receiving the
> 'EXIT' message.  Here are some code snippets:
>
> ...
>   process_flag(trap_exit, true),
>   link(Connection),
>   link(Channel),
> ...
>
> loop(State) ->
>   ...
>   {'EXIT', What, Reason} ->
>     do_something_smart();
>
>   ...
>
> Connection and Channel are a RabbitMQ connection and channel (although
> that's not necessarily important to know). I'm manually running "force
> close" on the connection via the RabbitMQ admin interface to trigger the
> 'EXIT'.  In one case the 'EXIT' message is received and in the other case
> it isn't. Here are more code snippets to illustrate this (same loop/1
> function as above):
>
> loop(#state{channel=Channel, delay_ack= DelayAck} = State) ->
>   ...
>
>   {#'basic.deliver'{delivery_tag=DeliveryTag}, Content} ->
>     ... do something with the content
>     case DelayAck of
>       true ->
>         timer:sleep(500), %% allow time for 'EXIT' to arrive in the
> mailbox before "ack_delivery" message
>         self() ! {ack_delivery, Channel, DeliveryTag},
>         loop(State);
>       _ ->
>        amqp_channel:call(Channel, #'basic.ack'{delivery_tag=DeliveryTag}),
>        loop(State)
>     end;
>
>   {ack_delivery, Channel, DeliveryTag} ->
>     timer:sleep(50), %% ack delay
>     amqp_channel:call(Channel, #'basic.ack'{delivery_tag=DeliveryTag}),
>     loop(State);
>
>   ...
>
> In the above snippet DelayAck specifies whether the actual ack happens
> immediately or as a result of sending another message through loop/1.
> When DelayAck is false the 'EXIT' message is received as expected. When
> DelayAck is true there is a sleep of 500ms in order to allow the 'EXIT' to
> arrive in the mailbox before the {ack_delivery, Channel, DeliveryTag}
> message. But in this case the 'EXIT' message isn't received. The process
> instead fails with a "noproc" when invoking amqp_channel:call/2 in
> {ack_delivery...}. This makes sense since the Channel is now invalid, but I
> did expect 'EXIT' to be received first thereby avoiding this failure.
> Increasing the sleep before sending the {ack_delivery...} message doesn't
> make any difference (except to delay the "noproc" failure). The behavior
> described in this paragraph is consistent across several test runs.
>
> What would explain why the 'EXIT' message isn't received (ahead of the
> ack_delivery message, or even at all) in the DelayAck case?
>
> Thanks,
> Rich
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150225/ee7cb01d/attachment.htm>


More information about the erlang-questions mailing list