exception exit: timeout in gen_server:call
Roberto Ostinelli
ostinelli@REDACTED
Tue Dec 3 19:37:27 CET 2019
This function nothing is but a single postgres query, a “select * from users where id = ‘123’”, properly indexed.
The only thing I can see is a latency towards the db (infra-aws regions unfortunately). It really is that at a certain moment, randomly (sometimes after 5 minutes, other times after 2 days) this happens and there’s no recovery whatsoever.
I could start rejecting http calls if the poolboy can’t perform a checkout, not sure how to do that (need to check the library better).
>> On 3 Dec 2019, at 19:26, Jesper Louis Andersen <jesper.louis.andersen@REDACTED> wrote:
>
> The keys_db function is currently executing gen:do_call/4
>
> It is blocked on some other process somewhere, waiting for it to respond, so it can respond back to the 11906 requests it has :) Chances are that the next request in the queue will do a gen:do_call/4 for the 11905 there are left in the queue. And in the mean time, 30 new requests arrived, so we are now at 11935.
>
> In short, it is likely something is running too slow to handle the current load on the system and furthermore, there is no flow control in the system to make the callers reduce the load.
>
> Since it happens randomly, it is likely there is a single large request, done by a single user now and then. And this request stalls the system. Probably enough to have the queue grow and errors start happening. Put a limit on what can be requested, and force the client to cooperate by having it call in with a "plz continue" token.
>
> You can also ask the system for a more detailed stack trace, see erlang:process_info/2 and current_stacktrace when it goes bad. This can often tell you which gen_server call is being made and to whom, narrowing down the problem.
>
>> On Tue, Dec 3, 2019 at 7:14 PM Roberto Ostinelli <ostinelli@REDACTED> wrote:
>> Forgot to mention: this happens in a completely random way.
>>
>>> On Tue, Dec 3, 2019 at 6:48 PM Roberto Ostinelli <ostinelli@REDACTED> wrote:
>>> All,
>>> I ame experiencing the following error when calling a transaction in poolboy as per the README:
>>>
>>> equery(PoolName, Stmt, Params) ->
>>> poolboy:transaction(PoolName, fun(Worker) ->
>>> gen_server:call(Worker, {equery, Stmt, Params})
>>> end).
>>> ** exception exit: {timeout,{gen_server,call,
>>> [keys_db,{checkout,#Ref<0.0.1.156295>,true},5000]}}
>>> in function gen_server:call/3 (gen_server.erl, line 212)
>>> in call from poolboy:checkout/3 (/home/ubuntu/workspace/myapp/_build/default/lib/poolboy/src/poolboy.erl, line 55)
>>> in call from poolboy:transaction/3 (/home/ubuntu/workspace/myapp/_build/default/lib/poolboy/src/poolboy.erl, line 74)
>>> The process queue keeps on increasing, and I can see the following:
>>>
>>> 3> erlang:process_info(whereis(keys_db)).
>>> [{registered_name,keys_db},
>>> {current_function,{gen,do_call,4}},
>>> {initial_call,{proc_lib,init_p,5}},
>>> {status,waiting},
>>> {message_queue_len,11906},
>>> {messages,[{'$gen_cast',{cancel_waiting,#Ref<0.0.1.138090>}},
>>> {'$gen_call',{<0.15224.0>,#Ref<0.0.1.139621>},
>>> {checkout,#Ref<0.0.1.139620>,true}},
>>> {'$gen_call',{<0.15139.0>,#Ref<0.0.1.139649>},
>>> {checkout,#Ref<0.0.1.139648>,true}},
>>> {'$gen_cast',{cancel_waiting,#Ref<0.0.1.138159>}},
>>> {'$gen_cast',{cancel_waiting,#Ref<0.0.1.138175>}},
>>> {'$gen_cast',{cancel_waiting,#Ref<0.0.1.138232>}},
>>> {'$gen_cast',{cancel_waiting,#Ref<0.0.1.138252>}},
>>> {'$gen_cast',{cancel_waiting,#Ref<0.0.1.138261>}},
>>> {'$gen_cast',{cancel_waiting,#Ref<0.0.1.138286>}},
>>> {'$gen_call',{<0.15235.0>,#Ref<0.0.1.139774>},
>>> {checkout,#Ref<0.0.1.139773>,true}},
>>> {'$gen_cast',{cancel_waiting,#Ref<0.0.2.77777>}},
>>> {'$gen_cast',{cancel_waiting,#Ref<0.0.1.138318>}},
>>> {'$gen_cast',{cancel_waiting,#Ref<0.0.1.138336>}},
>>> {'$gen_call',{<0.15233.0>,#Ref<0.0.1.139816>},
>>> {checkout,#Ref<0.0.1.139815>,true}},
>>> {'$gen_call',{<0.15245.0>,#Ref<0.0.1.139854>},
>>> {checkout,#Ref<0.0.1.139853>,true}},
>>> {'$gen_call',{<0.15237.0>,#Ref<0.0.2.78173>},
>>> {checkout,#Ref<0.0.2.78172>,...}},
>>> {'$gen_cast',{cancel_waiting,#Ref<0.0.1.138407>}},
>>> {'$gen_call',{<0.15228.0>,...},{...}},
>>> {'$gen_call',{...},...},
>>> {'$gen_call',...},
>>> {...}|...]},
>>> {links,[<0.714.1>,<0.817.1>,<0.947.1>,<0.1015.1>,<0.1045.1>,
>>> <0.1048.1>,<0.1038.1>,<0.983.1>,<0.1002.1>,<0.962.1>,
>>> <0.877.1>,<0.909.1>,<0.938.1>,<0.892.1>,<0.849.1>,<0.866.1>,
>>> <0.832.1>,<0.765.1>,<0.789.1>,<0.804.1>|...]},
>>> {dictionary,[{'$initial_call',{poolboy,init,1}},
>>> {'$ancestors',[pgpool_sup,<0.673.0>]}]},
>>> {trap_exit,true},
>>> {error_handler,error_handler},
>>> {priority,normal},
>>> {group_leader,<0.672.0>},
>>> {total_heap_size,393326},
>>> {heap_size,196650},
>>> {stack_size,33},
>>> {reductions,14837255},
>>> {garbage_collection,[{max_heap_size,#{error_logger => true,
>>> kill => true,
>>> size => 0}},
>>> {min_bin_vheap_size,46422},
>>> {min_heap_size,233},
>>> {fullsweep_after,10},
>>> {minor_gcs,3}]},
>>> {suspending,[]}]
>>> Does someone have an insight of what may be going wrong? I see that the process status is waiting...
>>>
>>> Thank you,
>>> r.
>
>
> --
> J.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20191203/e1b39676/attachment.htm>
More information about the erlang-questions
mailing list