[erlang-questions] release handler crash during relup, sys:get_status noproc

Éric Pailleau eric.pailleau@REDACTED
Mon Aug 24 22:55:15 CEST 2015


Hi,

Pid not beginning with 0 is not a local Pid.
Is your release upgradeing a distributed application ?


Le 24 août 2015 18:20, Richard Jones <rj@REDACTED> a écrit :
>
> Anyone else experienced a crash like this when doing a release upgrade?
> ie, calling release_handler:install_release, with a valid relup
>
> {"init terminating in do_boot",{{badmatch,{error,{'EXIT',{noproc,{sys,get_status,[<6453.14610.13>]}}}}},[{erl_eval,expr,3,[]}]}}
>
> I've seen this a couple of times now (erlang 17.x) when upgrading production systems under load, even with a trivial relup. No idea what that pid was.
>
> I think it might be a race in release_handler_1 where it calls sys:get_status without a catch, when the process in question may have been a supervision tree that had legitimately shut down since the list of pids was fetched.
>
> ie:
>
> https://github.com/erlang/otp/blob/OTP-17.5.6.3/lib/sasl/src/release_handler_1.erl#L589
>
> which calls get_proc_state, which does:
>
> {status, _, {module, _}, [_, State, _, _, _]} = sys:get_status(Proc)
>
> I've not managed to make a test for this yet, planning to spam lots of terminate_childs to a busy supervisor while calling release_handler_1:get_supervised_procs to try and reproduce.
>
> If i'm right, it would only be triggered if parts of a supervision tree are shutting down during a release_upgrade, which perhaps isn't very common, depending on how dynamic the average supervision tree is in erlang apps.
>
> Any feedback appreciated before I spend more time studying release handler code :)
>
> RJ
>
>
>
>
>


More information about the erlang-questions mailing list