[erlang-questions] Timeout Erlang GenServer Crash Loop

Michael Truog mjtruog@REDACTED
Fri Oct 12 09:31:13 CEST 2012


Ok, if you are experiencing latency with file io, make sure you have async thread pool threads set on the Erlang VM with something like:
erl +A 5

If it is related to the async thread pool, the job queue is not shared between the threads... it is a queue per thread, so the size of the async thread pool can impact the wait time... meaning that file io can take longer if the async thread pool is smaller, but you normally don't need a large number of async threads started.

If it is socket stuff, it might be related to the encoding, but there are many possibilities down that road.

On 10/12/2012 12:15 AM, Code Box wrote:
> Thanks for your reply. I really appreciate it. I am sure i do have a lot of load on my server like few thousands requests per second. But the process getting time out is not waiting on any other process that call just does a . So I definitely think it is due to the reason that the process is overloaded and all the other requests to that process are in the process queue are getting time outs. I am trying to prove this looking at the Server metrics around CPU, Memory, IO Stats. Talking about IO Stats i do see a big spike in IO Stats. That could be the reason for other processes blocked till the IO is happening which can cause CPU contention.
>
>
>
> On Thu, Oct 11, 2012 at 10:18 PM, Michael Truog <mjtruog@REDACTED <mailto:mjtruog@REDACTED>> wrote:
>
>     Well a common problem is to have the process also blocked on its own synchronous call, so that can keep the CPU usage low, since it is spending time mostly idle waiting for 1 or more responses from some other processes.  The best way I have seen to deal with this type of timeout problem is to always pass the timeouts in the message like this:
>     gen_server:call(<process>, {<message>, Timeout - DELTA}, Timeout)
>     Where DELTA can be 100 milliseconds.  Then the (Timeout-DELTA) value the handle_call sees can be used for any internally synchronous calls.  However, then the problem becomes understanding what the cumulative delay might be, if there are multiple synchronous calls used within the process.  Ideally, the process is kept simpler, so it doesn't need to try and track many synchronous calls.
>
>     I am not entirely sure if this is your problem, since it could be latency due to function calls too, if function calls are blocking schedulers or something strange, code loading locking schedulers.  Usually those issues aren't as common a concern though.
>
>
>     On 10/11/2012 10:05 PM, Code Box wrote:
>>     Will it not relate to any CPU Stats of my host and also any memory stats of my host that the process is overloaded ? I see CPU % usage as just 50% ?
>>
>>     On Thu, Oct 11, 2012 at 9:14 PM, Michael Truog <mjtruog@REDACTED <mailto:mjtruog@REDACTED>> wrote:
>>
>>         On 10/11/2012 09:03 PM, Code Box wrote:
>>>         ** Reason for termination ==
>>>         ** {timeout,{gen_server,call,[thetime,gettime]}}
>>>
>>>         =CRASH REPORT==== 2012-10-09 05:37:04 UTC ===
>>>           crasher:
>>>             initial call: process_listener:-init/1-fun-2-/0
>>>             pid: <0.12376.513>
>>>             registered_name: []
>>>             exception exit: {timeout,{gen_server,call,[thetime,gettime]}}
>>>               in function  gen_server:terminate/6
>>>             ancestors: [incoming_req_processor,incoming_sup,top_process_sup,
>>>                           <0.52.0>]
>>>             messages: []
>>>             links: []
>>>             dictionary: [{random_seed,{23375,22820,17046}}]
>>>             trap_exit: true
>>>             status: running
>>>             heap_size: 6765
>>>             stack_size: 24
>>>             reductions: 1646842
>>>           neighbours:
>>>
>>>         I am seeing a lot of these messages in my Crash Reports. Once this reaches this it goes into this crash loop for quite a while. I am not sure how to debug this error. These timeouts are really annoying. Can some one help me understand the root cause of this?
>>>
>>>         Why does my genserver calls are facing timeouts ? Is it that my erlang VM is slow if so why ? How can i debug this issue to get to the root cause of it ? 
>>>
>>         If you look at gen_server:call/2 at http://www.erlang.org/doc/man/gen_server.html
>>         it shows the default Timeout is 5000 milliseconds (5 seconds).  Your gen_server process must have been processing for longer than 5 seconds while a gen_server:call/2 message was waiting in the process message queue, to cause the timeout exception.  So, it isn't the Erlang VM being slow, it is just an Erlang process that is overloaded (i.e., the "thetime" registered process).
>>
>>
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20121012/8175648f/attachment.htm>


More information about the erlang-questions mailing list