[erlang-questions] Timeout Erlang GenServer Crash Loop
Matthew Evans
mattevans123@REDACTED
Fri Oct 12 18:40:22 CEST 2012
It's hard to answer without knowing what your code is doing (i.e. maybe there is an inefficiency somewhere). However, a common design pattern if your gen_server is doing complex work is to spawn another process to do this task. If you are running multicore the work will be distributed over the different cores:
e.g.
handle_call({some_operation,Data}, From, State) -> spawn(fun() -> Rsp = do_lots_of_work(), gen_server:reply(From,Rsp) end), {noreply,State};
Date: Fri, 12 Oct 2012 00:31:13 -0700
From: mjtruog@REDACTED
To: codeithere@REDACTED
CC: erlang-questions@REDACTED
Subject: Re: [erlang-questions] Timeout Erlang GenServer Crash Loop
Ok, if you are experiencing latency with file io, make sure you have
async thread pool threads set on the Erlang VM with something like:
erl +A 5
If it is related to the async thread pool, the job queue is not
shared between the threads... it is a queue per thread, so the size
of the async thread pool can impact the wait time... meaning that
file io can take longer if the async thread pool is smaller, but you
normally don't need a large number of async threads started.
If it is socket stuff, it might be related to the encoding, but
there are many possibilities down that road.
On 10/12/2012 12:15 AM, Code Box wrote:
Thanks for your reply. I really appreciate it. I am sure i do have
a lot of load on my server like few thousands requests per second.
But the process getting time out is not waiting on any other
process that call just does a . So I definitely think it is due to
the reason that the process is overloaded and all the other
requests to that process are in the process queue are getting time
outs. I am trying to prove this looking at the Server metrics
around CPU, Memory, IO Stats. Talking about IO Stats i do see a
big spike in IO Stats. That could be the reason for other
processes blocked till the IO is happening which can cause CPU
contention.
On Thu, Oct 11, 2012 at 10:18 PM, Michael Truog <mjtruog@REDACTED>
wrote:
Well a common problem is to have the process also
blocked on its own synchronous call, so that can keep
the CPU usage low, since it is spending time mostly
idle waiting for 1 or more responses from some other
processes. The best way I have seen to deal with this
type of timeout problem is to always pass the timeouts
in the message like this:
gen_server:call(<process>, {<message>,
Timeout - DELTA}, Timeout)
Where DELTA can be 100 milliseconds. Then the
(Timeout-DELTA) value the handle_call sees can be used
for any internally synchronous calls. However, then
the problem becomes understanding what the cumulative
delay might be, if there are multiple synchronous
calls used within the process. Ideally, the process
is kept simpler, so it doesn't need to try and track
many synchronous calls.
I am not entirely sure if this is your problem, since
it could be latency due to function calls too, if
function calls are blocking schedulers or something
strange, code loading locking schedulers. Usually
those issues aren't as common a concern though.
On 10/11/2012 10:05 PM, Code Box wrote:
Will it not relate to any
CPU Stats of my host and also any memory stats
of my host that the process is overloaded ? I
see CPU % usage as just 50% ?
On Thu, Oct 11, 2012 at 9:14 PM, Michael
Truog <mjtruog@REDACTED>
wrote:
On 10/11/2012 09:03 PM, Code Box
wrote:
** Reason for termination ==
**
{timeout,{gen_server,call,[thetime,gettime]}}
=CRASH REPORT==== 2012-10-09
05:37:04 UTC ===
crasher:
initial call:
process_listener:-init/1-fun-2-/0
pid: <0.12376.513>
registered_name: []
exception exit:
{timeout,{gen_server,call,[thetime,gettime]}}
in function
gen_server:terminate/6
ancestors:
[incoming_req_processor,incoming_sup,top_process_sup,
<0.52.0>]
messages: []
links: []
dictionary:
[{random_seed,{23375,22820,17046}}]
trap_exit: true
status: running
heap_size: 6765
stack_size: 24
reductions: 1646842
neighbours:
I am seeing a lot of these
messages in my Crash Reports. Once
this reaches this it goes into
this crash loop for quite a while.
I am not sure how to debug this
error. These timeouts are really
annoying. Can some one help me
understand the root cause of this?
Why does my genserver calls are
facing timeouts ? Is it that my
erlang VM is slow if so why ? How
can i debug this issue to get to
the root cause of it ?
If you look at gen_server:call/2 at http://www.erlang.org/doc/man/gen_server.html
it shows the default Timeout is 5000
milliseconds (5 seconds). Your gen_server
process must have been processing for
longer than 5 seconds while a
gen_server:call/2 message was waiting in
the process message queue, to cause the
timeout exception. So, it isn't the
Erlang VM being slow, it is just an Erlang
process that is overloaded (i.e., the
"thetime" registered process).
_______________________________________________
erlang-questions mailing list
erlang-questions@REDACTED
http://erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20121012/c385b6c6/attachment.htm>
More information about the erlang-questions
mailing list