<div dir="ltr">I have an explanation for why it appears as if an 'EXIT' message isn't received and processed. It seems kind of obvious looking back, but also subtle enough that I don't feel like a complete idiot :) Hopefully this may help someone else who trips over the problem. I'm also interested in feedback, especially concerning some alternatives I describe below for addressing it.<div><br></div><div>So to recap, I have a process that creates a process using spawn_link. This parent process has a receive loop with a clause that matches 'EXIT'. The problem I encountered was that the child's 'EXIT' messages weren't always being received by the parent process before the parent process tried to send a message to the linked child process. In this case the parent process would fail with a "noproc" error (i.e., the child process no longer exists).</div><div><br></div><div>The cause appears to be the order of messages received in the parent's mailbox. The 'EXIT' message is handled just like any other message in that a process's mailbox behaves like a queue (FIFO). So the 'EXIT' message is placed in the mailbox after any other unprocessed messages. Those messages will be handled before the 'EXIT' message. In my case, one of the prior messages results in sending a message (PID ! Message) to the failed child process before the 'EXIT' message is seen, a simple race condition. I have seen test results that support this theory. Am I correctly characterizing the handling of 'EXIT' messages WRT general mailbox behavior?</div><div><br></div><div>Since this behavior is non-deterministic my simple approach of handling 'EXIT' messages won't work. Alternatives I'm considering include:</div><div>1. Selective receives, but I don't really see this as a robust approach to handling this given the description of this approach in LYSE. LYSE also describes a better approach using min_heap, but this seems like overkill in my case.</div><div>2. Handling 'EXIT' messages and backing that up with a try/catch for the race condition.</div><div>3. Forget linking altogether and just handle the problem with a try/catch block</div><div><br></div><div>I'm leaning towards option 3 and I'm interested in other opinions/options.</div><div><br></div><div>Finally, this seems like a fairly common use case, especially in RabbitMQ applications (where the recommendation is to monitor/link amqp_channel processes). But maybe I'm missing something, or misusing 'EXIT'? Any comments?</div><div><br></div><div>Thanks,</div><div>Rich</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Feb 17, 2015 at 3:25 PM, Youngkin, Rich <span dir="ltr"><<a href="mailto:richard.youngkin@pearson.com" target="_blank">richard.youngkin@pearson.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi,<div><br></div><div>I've got an app that spawn_links processes with trap_exit. I'm killing the linked processes but the monitoring process isn't always receiving the 'EXIT' message. Here are some code snippets:</div><div><br></div><div>...</div><div> process_flag(trap_exit, true),</div><div> link(Connection),</div><div> link(Channel),</div><div>...</div><div><br></div><div>loop(State) -></div><div> ...</div><div> {'EXIT', What, Reason} -></div><div> do_something_smart();</div><div><br></div><div> ...<br></div><div><br></div><div>Connection and Channel are a RabbitMQ connection and channel (although that's not necessarily important to know). I'm manually running "force close" on the connection via the RabbitMQ admin interface to trigger the 'EXIT'. In one case the 'EXIT' message is received and in the other case it isn't. Here are more code snippets to illustrate this (same loop/1 function as above):</div><div><br></div><div>loop(#state{channel=Channel, delay_ack= DelayAck} = State) -></div><div> ...</div><div><br></div><div> {#'basic.deliver'{delivery_tag=DeliveryTag}, Content} -></div><div> ... do something with the content</div><div> case DelayAck of</div><div> true -></div><div> timer:sleep(500), %% allow time for 'EXIT' to arrive in the mailbox before "ack_delivery" message</div><div> self() ! {ack_delivery, Channel, DeliveryTag},</div><div> loop(State);</div><div> _ -></div><div> amqp_channel:call(Channel, #'basic.ack'{delivery_tag=DeliveryTag}),</div><div> loop(State)</div><div> end;</div><div><br></div><div> {ack_delivery, Channel, DeliveryTag} -></div><div> timer:sleep(50), %% ack delay</div><div><div> amqp_channel:call(Channel, #'basic.ack'{delivery_tag=DeliveryTag}),</div><div> loop(State);</div></div><div><br></div><div> ...</div><div><br></div><div>In the above snippet DelayAck specifies whether the actual ack happens immediately or as a result of sending another message through loop/1.</div><div>When DelayAck is false the 'EXIT' message is received as expected. When DelayAck is true there is a sleep of 500ms in order to allow the 'EXIT' to arrive in the mailbox before the {ack_delivery, Channel, DeliveryTag} message. But in this case the 'EXIT' message isn't received. The process instead fails with a "noproc" when invoking amqp_channel:call/2 in {ack_delivery...}. This makes sense since the Channel is now invalid, but I did expect 'EXIT' to be received first thereby avoiding this failure. Increasing the sleep before sending the {ack_delivery...} message doesn't make any difference (except to delay the "noproc" failure). The behavior described in this paragraph is consistent across several test runs.</div><div><br></div><div>What would explain why the 'EXIT' message isn't received (ahead of the ack_delivery message, or even at all) in the DelayAck case?</div><div><br></div><div>Thanks,<br>Rich</div><div><br></div><div><br></div><div><br></div></div>
</blockquote></div><br></div>