[erlang-questions] Debugging scheduler not responding to erts_schedule_misc_aux_work

anthonym@REDACTED anthonym@REDACTED
Fri Jun 29 03:13:08 CEST 2018


It may or may not apply but we had a similar problem with system level
work being scheduled on freed processes, and have been debugging it with
Ericsson via

https://bugs.erlang.org/browse/ERL-573

for the last few months.  There's a branch of 20 in one of the later
comments which might help, and might be worth a try.

HTH,

-Anthony


> I’m trying to debug some weird condition when any misc system task hangs.
> It seems to affect OTP 20 (but not 16) on FreeBSD 10.3 and 11.
>
> It is a rare problem happening after 5-7 days under some load (~40% cpu
> average on a 48 cores server).
>
> There is also a problem with erlang:statistics(runtime), affected by this
> bug in FreeBSD kernel:
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227689 (so
> statistics:runtime() always returns the same value), however I doubt it
> can affect anything.
>
> What happens:  there are several calls, e.g.
> erlang:statistics(garbage_collection), ets:all(),
> erts_internal:system_check() and few more. All of them do
> erts_schedule_misc_aux_work. A misc aux work item is put into every
> scheduler queue, and it seems that all of them except one respond. VM is
> still working, all other processes are fine, but the one that did the call
> is waiting in erlang:gc_info/2 (or another corresponding function), with
> counter equals to 1. Since there is no timeout in receive statement, it
> waits forever.
>
> How do I debug this? Is there any way to find a scheduler that misbehaves?
> It is one of the normal schedulers. I’m using gdb to attach to BEAM VM.
>
> Unfortunately, I cannot run debug VM (it is not able to handle the load).
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>





More information about the erlang-questions mailing list