[erlang-questions] Debugging scheduler not responding to erts_schedule_misc_aux_work

Dave Cottlehuber dch@REDACTED
Sat Jun 30 14:06:20 CEST 2018


On Thu, 28 Jun 2018, at 22:35, Maxim Fedorov wrote:
> I’m trying to debug some weird condition when any misc system
> task hangs.> It seems to affect OTP 20 (but not 16) on FreeBSD 10.3 and 11.
>
> It is a rare problem happening after 5-7 days under some load
> (~40% cpu> average on a 48 cores server).
>
> There is also a problem with erlang:statistics(runtime), affected by
> this bug in FreeBSD kernel:
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227689 (so
> statistics:runtime() always returns the same value), however I
> doubt it> can affect anything.
>
> What happens:  there are several calls, e.g.
> erlang:statistics(garbage_collection), ets:all(),
> erts_internal:system_check() and few more. All of them do
> erts_schedule_misc_aux_work. A misc aux work item is put into every
> scheduler queue, and it seems that all of them except one
> respond. VM is> still working, all other processes are fine, but the one that did the> call is waiting in erlang:gc_info/2 (or another corresponding
> function),> with counter equals to 1. Since there is no timeout in receive
> statement, it waits forever.
>
> How do I debug this?

I get that your load in production is high, but will a targeted dtrace
probe be lightweight enough?
Also wrt Anthony’s comments are you building with clang or gcc? I’m not
clear if that’s relevant
A+
Dave







-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20180630/cfc6945a/attachment.htm>


More information about the erlang-questions mailing list