[erlang-questions] processes stuck in erlang:bif_return_trap/1

H.Li@REDACTED H.Li@REDACTED
Mon Sep 8 22:30:16 CEST 2014


Dear All,

I wonder if anyone could help me with this.

We have been experiencing some problems with an Erlang node running
Mnesia. For some reason, the node becomes unresponsive every few minutes,
but the system does recover after a short period of time.

We did some profiling using ETop, and it shows that when the system
freezes, there are about 5000 processes accumulated on this node getting
stuck in the erlang:bif_return_trap/1 function.  These processes are
spawned by a rpc_server running on this node;  each process computes a
M:F(A)  then sends the result back via gen_tcp or stores the result in an
ets table.

I don't quite understand what erlang:bif_return_trap/1 does, and am
confused why so many processes got stuck in this function. The Erlang node
is running on a 12 physical core machine (hence 24 schedulers), and the
version of Erlang is R15B03.  Here is part of the ETop output:

Load:  cpu       317       Memory:  total    66521736    binary     393593
        procs    5785               processes 1253072    code        12280
        runq        0              atom          493    ets      64843631
Pid            Name or Initial Func    Time    Reds  Memory    MsgQ
Current Function
----------------------------------------------------------------------------------------
<5291.115.0>   mnesia_tm                '-'       0********         0
mnesia_tm:doit_loop/
<5291.6.0>     error_logger             '-'       012342496         0
gen_event:fetch_msg/
<5291.61.0>    memsup                   '-'     138 4714912         0
gen_server:loop/6
<5291.1464.0>  proc_lib:init_p/5        '-'     103 2914384         0
gen_fsm:loop/7
<5291.1466.0>  proc_lib:init_p/5        '-'     105 2914384         0
gen_fsm:loop/7
          .
          .
          .
***************rpc_socket:worker/6      '-'    8141  101144       0
erlang:bif_return_tr
***************rpc_socket:worker/6      '-'    7900  101144       0
erlang:bif_return_tr
***************rpc_socket:worker/6      '-'    7901  101144       0
erlang:bif_return_tr
***************rpc_socket:worker/6      '-'    7954  101144       0
erlang:bif_return_tr
***************rpc_socket:worker/6      '-'    7975  101144       0
erlang:bif_return_tr
***************rpc_socket:worker/6      '-'    8068  101144       0
erlang:bif_return_tr
***************rpc_socket:worker/6      '-'    7956  101144       0
erlang:bif_return_tr
***************rpc_socket:worker/6      '-'    8212  101144       0
erlang:bif_return_tr

Many Thanks!

Huiqing


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140908/e5601703/attachment.htm>


More information about the erlang-questions mailing list