<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div apple-content-edited="true" class=""><br class=""></div><div><blockquote type="cite" class=""><div class="">On Jan 26, 2015, at 11:18 AM, Bob Ippolito <<a href="mailto:bob@redivi.com" class="">bob@redivi.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote">On Mon, Jan 26, 2015 at 9:39 AM, Bryan <span dir="ltr" class=""><<a href="mailto:bryan@go-factory.net" target="_blank" class="">bryan@go-factory.net</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word" class="">Hi Everyone,<div class=""><br class=""></div><div class="">I was hoping to better understand an interesting condition I recently encountered, and was able to alleviate though I am not 100% clear why.</div><div class=""><br class=""></div><div class="">In our system, we have two types of main processes: for simplicity sake, lets just call them groups (A) and endpoints (B). Each of the processes are implemented as gen_servers.</div><div class=""><br class=""></div><div class="">Process A implements functionality and represents a group of endpoints. These endpoints are then each instantiated as a process B. Each endpoint can then be in multiple groups. If I have two groups, then I will have two A processes. If I have five endpoints, I will then have five B processes. In our example, endpoint process #3 is a member in groups one and two.</div><div class=""><br class=""></div><div class="">The system is very simple. If a change occurs in A, a message is then sent to each endpoint process B that is a member. In our example, group #1 process would send a message to five endpoint processes. If a change occurs in the endpoint process B, a message is sent to each group process A it is a member of. In our example, if this is endpoint #3, it sends a message to both group one and two.</div><div class=""><br class=""></div><div class="">Seems simple enough. My interesting condition that I ran into was where one of the messages from the group process A to the endpoint process B was a cast. All others for both gen_servers are calls. When A sent the cast message to B, B simply updates its state. For reasons that are not clear to me, this ultimately reaches a timeout state, where all the processes start timing out, even though there are no calling/casting cycles.</div><div class=""><br class=""></div><div class="">I know that calling cycles introduce a deadlock condition, but I trying to understand why a cast, which is suppose to return immediately and be handled asynchronously would produce a timeout?</div><div class=""><br class=""></div><div class="">When I move this message from a cast to a call, the system works perfectly.</div></div></blockquote><div class=""><br class=""></div><div class="">Just a guess, but I would check to make sure that the code for handle_cast in the recipient "B" process wasn't doing something to make it unresponsive, such that the next call to that process would timeout. Are you sure that there was no call as a result of that handle_cast?</div><div class=""> </div></div></div></div>

</div></blockquote></div><br class=""><div class=""><br class=""></div><div class="">The handle_cast simply updates the state of the recipient process. It adds an element to a very small list of fewer than 10 entries kept in the state record of the recipient process, then returns immediately. There are no other function calls. That is what is curious about this. </div><div class=""><br class=""></div><div class="">I did more investigation, tracing the function calls looking for any cycles anywhere in the stack. I found something interesting, which I still find a bit confusing. The call that is timing out is a simple handle_call that I describe above. There is another message that was a handle_cast that when I switched it to a handle_call, the system works fine - no timeouts. </div><div class=""><br class=""></div><div class="">What I discovered in the function body of the handle_cast (which I switched to handle_call) is that there is a cycle - the recipient process then does a handle_call back to the calling process to notify it of the change that introduced. This is definitely broken. What I am now more confused by is why it is working at all.</div><div class=""><br class=""></div><div class="">In more detail:</div><div class=""><br class=""></div><div class="">process A function_a() does a handle_call to process B to simply update its state. Very simple and light call.</div><div class=""><br class=""></div><div class="">process A function_b() does a handle_cast to process B to handle a more complex message, updates it state, but then accidentally does a handle_call back to Process A (the cycle).</div><div class=""><br class=""></div><div class="">This should not work under any circumstance, but when I switch handle_cast to handle_call in function_b(), there are no timeouts.</div><div class=""><br class=""></div><div class="">Thanks for input! Much appreciated.</div><div class=""><br class=""></div><div class="">Cheers,</div><div class="">Bryan</div></body></html>