Fwd: Re: Supervisor got noproc (looks like a bug)

Maria Scott maria-12648430@REDACTED
Wed Sep 8 13:09:22 CEST 2021


Forgot to reply to the list =^^=
Resending so others can chime in.

> ---------- Ursprüngliche Nachricht ----------
> Von: Maria Scott <maria-12648430@REDACTED>
> An: Alexander Petrovsky <askjuise@REDACTED>
> Datum: 08.09.2021 12:52
> Betreff: Re: Supervisor got noproc (looks like a bug)
> 
>  
> Hi :)
> 
> first, this is partly guesswork, so take with a grain of salt.
> 
> You have a situation where the child may be terminated by the supervisor (via terminate_child) and may at the same time be terminating by itself (via {stop, ...}), is that right?
> 
> While your child is running, it is linked to the supervisor, but not monitored. When the supervisor is told to shut down (terminate) a child, what it does is this (simplified, see https://github.com/erlang/otp/blob/0bad25713b0bc4a875e9ef7d9b1abcb6a2f75061/lib/stdlib/src/supervisor.erl#L923-L982 for all the details):
> (a) monitor the child
> (b) unlink the child
> (c) check for an EXIT message (in case the child already terminated before the monitoring)
> (d) if there is an EXIT message, flush out the DOWN message and return the EXIT reason (and that's it in this case)
> (e) otherwise, if no EXIT message is there, call exit(Child, shutdown)
> (f) wait for a DOWN message; reasons shutdown and normal are normal exits, everything else produces a shutdown_error
> 
> By only intuition, this flow should hold no matter if and when the child terminates by itself.
> The key to understanding how the shutdown_error you describe arises is this passage from the docs for monitor/2: "The monitor request is an asynchronous signal. That is, it takes time before the signal reaches its destination." unlink/1, while it is also an asynchronous request that takes time to reach the other process, does something more: it marks the link as inactive on the process calling unlink, and "The exit signal is silently dropped if ... the corresponding link has been deactivated".
> 
> So what I think is happening when the error you describe occurs is this:
> - the supervisor calls monitor(process, Child) (see (a)), but the message does not reach the child immediately
> - the supervisor unlinks the child (see (b)), deactivating the link
> - the child dies (exits by itself as a result of {stop, ...}); but as it is now unlinked, there is no EXIT message (see (c) and (d))
> - the monitor signal reaches (or, doesn't rather) reach the child, resulting in a DOWN message with reason noproc
> - the supervisor receives the DOWN message (see (f)), and as the reason is not shutdown or normal, it gets propagated, ultimately resulting in the shutdown_error with reason noproc
> 
> As I said, this is pieced together from some (educated) guesswork ;) Don't rely on it until somebody else confirms it.
> 
> Kind regards,
> Maria


More information about the erlang-questions mailing list