Supervisor got noproc (looks like a bug)

Alexander Petrovsky askjuise@REDACTED
Tue Sep 7 19:21:04 CEST 2021


Hello!

Recently, during the performance test, I noticed - when the supervisor
tries to terminate children sometimes it's got 'noproc'. My erlang
version - OTP-23.2.5.

Accordingly, to some circumstances, the child
('gen_server:start_link/3' process) decides to stop with '{stop,
normal, State}'. At the same time, another process decides to stop
this child with 'supervisor:terminate_child/2'.

In some (rare, but reproducible) cases I've got in logs:

   supervisor: {local,my_sup}
   errorContext: shutdown_error
   reason: noproc
   offender: [{pid,<0.4890.1539>},
              {id,my},
              {mfargs,{my,start_link,[]}},
              {restart_type,temporary},
              {shutdown,5000},
              {child_type,worker}]
(supervisor:do_terminate/2)

Accordingly to https://github.com/erlang/otp/blob/OTP-23.2.5/lib/stdlib/src/supervisor.erl#L873-L919
this could happen when: child terminates with 'noproc' reason or
"child did unlink and the child dies before monitor". But the
termination reason in all cases is '{stop, normal, State}' and the
children don't do any 'erlang:unlink/1' at all!

I've gather the trace with:
dbg:tracer(process, {fun(Msg,_) -> syslog:info_msg("Trace ~0p~n",
[Msg]), 0 end, 0}).
dbg:p(whereis(my_sup), [m, c]).
dbg:tp(erlang,exit,[{'_',[],[{return_trace}]}]).
dbg:tp(erlang,unlink,[{'_',[],[{return_trace}]}]).
dbg:tp(erlang,monitor,[{'_',[],[{return_trace}]}]).
dbg:p(all, c).

And I see the different flows! When the supervisor got 'noproc' the
flow looks like:

2021-09-07T15:53:56.431343Z <0.5314.0> my: started
2021-09-07T15:53:56.433941Z Trace
{trace,<0.1089.0>,'receive',{ack,<0.5314.0>,{ok,<0.5314.0>}}}
2021-09-07T15:53:56.434081Z Trace
{trace,<0.1089.0>,send,{#Ref<0.1919367765.3881566209.208162>,{ok,<0.5314.0>}},<0.5284.0>}
2021-09-07T15:53:56.442202Z <0.5314.0> my: finished
2021-09-07T15:53:56.447844Z Trace
{trace,<0.1089.0>,'receive',{'$gen_call',{<0.5284.0>,#Ref<0.1919367765.3881566210.214548>},{terminate_child,<0.5314.0>}}}
2021-09-07T15:53:56.543227Z Trace
{trace,<0.1089.0>,call,{erlang,monitor,[process,<0.5314.0>]}}
2021-09-07T15:53:56.543352Z Trace
{trace,<0.1089.0>,return_from,{erlang,monitor,2},#Ref<0.1919367765.3881566212.211028>}
2021-09-07T15:53:56.543467Z Trace
{trace,<0.1089.0>,call,{erlang,unlink,[<0.5314.0>]}}
2021-09-07T15:53:56.543676Z Trace
{trace,<0.1089.0>,'receive',{'DOWN',#Ref<0.1919367765.3881566212.211028>,process,<0.5314.0>,noproc}}
2021-09-07T15:53:56.543854Z Trace
{trace,<0.1089.0>,call,{erlang,exit,[<0.5314.0>,shutdown]}}

In all other normal cases flow looks lie:

2021-09-07T15:54:24.489280Z <0.6838.0> my: started
2021-09-07T15:54:24.491185Z Trace
{trace,<0.1089.0>,'receive',{ack,<0.6838.0>,{ok,<0.6838.0>}}}
2021-09-07T15:54:24.491344Z Trace
{trace,<0.1089.0>,send,{#Ref<0.1919367765.3881566210.231347>,{ok,<0.6838.0>}},<0.6809.0>}
2021-09-07T15:54:24.514841Z <0.6838.0> my: finished
2021-09-07T15:54:24.529580Z Trace
{trace,<0.1089.0>,'receive',{'$gen_call',{<0.6809.0>,#Ref<0.1919367765.3881566209.224355>},{terminate_child,<0.6838.0>}}}
2021-09-07T15:54:24.616316Z Trace
{trace,<0.1089.0>,'receive',{'EXIT',<0.6838.0>,normal}}
2021-09-07T15:54:24.618596Z Trace
{trace,<0.1089.0>,call,{erlang,monitor,[process,<0.6838.0>]}}
2021-09-07T15:54:24.618760Z Trace
{trace,<0.1089.0>,call,{erlang,unlink,[<0.6838.0>]}}
2021-09-07T15:54:24.619175Z Trace
{trace,<0.1089.0>,'receive',{'DOWN',#Ref<0.1919367765.3881566212.227125>,process,<0.6838.0>,noproc}}
2021-09-07T15:54:24.619275Z Trace
{trace,<0.1089.0>,send,{#Ref<0.1919367765.3881566209.224355>,ok},<0.6809.0>}

Does it seem like the very tricky race or like erlang lost EXIT messages?

-- 
Alexander Petrovsky


More information about the erlang-questions mailing list