[erlang-questions] Cast versus Call and timeouts
Mon Jan 26 20:38:24 CET 2015
> On Jan 26, 2015, at 11:18 AM, Bob Ippolito <bob@REDACTED> wrote:
> On Mon, Jan 26, 2015 at 9:39 AM, Bryan <bryan@REDACTED <mailto:bryan@REDACTED>> wrote:
> Hi Everyone,
> I was hoping to better understand an interesting condition I recently encountered, and was able to alleviate though I am not 100% clear why.
> In our system, we have two types of main processes: for simplicity sake, lets just call them groups (A) and endpoints (B). Each of the processes are implemented as gen_servers.
> Process A implements functionality and represents a group of endpoints. These endpoints are then each instantiated as a process B. Each endpoint can then be in multiple groups. If I have two groups, then I will have two A processes. If I have five endpoints, I will then have five B processes. In our example, endpoint process #3 is a member in groups one and two.
> The system is very simple. If a change occurs in A, a message is then sent to each endpoint process B that is a member. In our example, group #1 process would send a message to five endpoint processes. If a change occurs in the endpoint process B, a message is sent to each group process A it is a member of. In our example, if this is endpoint #3, it sends a message to both group one and two.
> Seems simple enough. My interesting condition that I ran into was where one of the messages from the group process A to the endpoint process B was a cast. All others for both gen_servers are calls. When A sent the cast message to B, B simply updates its state. For reasons that are not clear to me, this ultimately reaches a timeout state, where all the processes start timing out, even though there are no calling/casting cycles.
> I know that calling cycles introduce a deadlock condition, but I trying to understand why a cast, which is suppose to return immediately and be handled asynchronously would produce a timeout?
> When I move this message from a cast to a call, the system works perfectly.
> Just a guess, but I would check to make sure that the code for handle_cast in the recipient "B" process wasn't doing something to make it unresponsive, such that the next call to that process would timeout. Are you sure that there was no call as a result of that handle_cast?
The handle_cast simply updates the state of the recipient process. It adds an element to a very small list of fewer than 10 entries kept in the state record of the recipient process, then returns immediately. There are no other function calls. That is what is curious about this.
I did more investigation, tracing the function calls looking for any cycles anywhere in the stack. I found something interesting, which I still find a bit confusing. The call that is timing out is a simple handle_call that I describe above. There is another message that was a handle_cast that when I switched it to a handle_call, the system works fine - no timeouts.
What I discovered in the function body of the handle_cast (which I switched to handle_call) is that there is a cycle - the recipient process then does a handle_call back to the calling process to notify it of the change that introduced. This is definitely broken. What I am now more confused by is why it is working at all.
In more detail:
process A function_a() does a handle_call to process B to simply update its state. Very simple and light call.
process A function_b() does a handle_cast to process B to handle a more complex message, updates it state, but then accidentally does a handle_call back to Process A (the cycle).
This should not work under any circumstance, but when I switch handle_cast to handle_call in function_b(), there are no timeouts.
Thanks for input! Much appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions