[erlang-questions] Debugging scheduler not responding to erts_schedule_misc_aux_work

Maxim Fedorov dane@REDACTED
Wed Jul 11 18:29:41 CEST 2018

Thanks, Anthony.

This patch (and Erlang 20.3.8, ERTS does not fix the problem. Same issue has been observed (2 servers out of 384 stopped responding to aux work). I believe it's a different issue. 

On 6/28/18, 18:14, "anthonym@REDACTED" <anthonym@REDACTED> wrote:

    It may or may not apply but we had a similar problem with system level
    work being scheduled on freed processes, and have been debugging it with
    Ericsson via
    for the last few months.  There's a branch of 20 in one of the later
    comments which might help, and might be worth a try.
    > I’m trying to debug some weird condition when any misc system task hangs.
    > It seems to affect OTP 20 (but not 16) on FreeBSD 10.3 and 11.
    > It is a rare problem happening after 5-7 days under some load (~40% cpu
    > average on a 48 cores server).
    > There is also a problem with erlang:statistics(runtime), affected by this
    > bug in FreeBSD kernel:
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.freebsd.org_bugzilla_show-5Fbug.cgi-3Fid-3D227689&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=0rT1IQzTc-5vHQf8ht6-x-ib_QvAEDJvef2Q98CuKgI&m=To7mEUnu73Ram_9CzpEXWHPvoyDBDz5HDUpb2LyN8p0&s=9ADTx_DQfyQVR8StEXx8AlI49seSh0G6lW-yqem4sIg&e= (so
    > statistics:runtime() always returns the same value), however I doubt it
    > can affect anything.
    > What happens:  there are several calls, e.g.
    > erlang:statistics(garbage_collection), ets:all(),
    > erts_internal:system_check() and few more. All of them do
    > erts_schedule_misc_aux_work. A misc aux work item is put into every
    > scheduler queue, and it seems that all of them except one respond. VM is
    > still working, all other processes are fine, but the one that did the call
    > is waiting in erlang:gc_info/2 (or another corresponding function), with
    > counter equals to 1. Since there is no timeout in receive statement, it
    > waits forever.
    > How do I debug this? Is there any way to find a scheduler that misbehaves?
    > It is one of the normal schedulers. I’m using gdb to attach to BEAM VM.
    > Unfortunately, I cannot run debug VM (it is not able to handle the load).
    > _______________________________________________
    > erlang-questions mailing list
    > erlang-questions@REDACTED
    > https://urldefense.proofpoint.com/v2/url?u=http-3A__erlang.org_mailman_listinfo_erlang-2Dquestions&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=0rT1IQzTc-5vHQf8ht6-x-ib_QvAEDJvef2Q98CuKgI&m=To7mEUnu73Ram_9CzpEXWHPvoyDBDz5HDUpb2LyN8p0&s=v-L1XxeW7SHFxLJvBtf9oVk7FY7XgrzBtT7WvHaYEmw&e=

More information about the erlang-questions mailing list