[erlang-questions] Node not responding to init:stop()

Roger Lipscombe roger@REDACTED
Mon Jul 2 18:24:29 CEST 2018


Right, so I've done some digging. I got a crash dump and by looking at
the stack trace for application_controller, I can see that it's
hanging in application_controller:terminate, presumably while waiting
for one of the processes to stop. The pid given is <0.893.0>. Looking
a bit further up the stack trace gets me the process state (possibly
stale?), which contains a bunch of applications and pids. The pid it's
hanging on is 'kernel', which I'd expect to be the last (?)
application to be shut down. The process state (assuming it's not
stale) has a load of other applications in there with pids. There are
a bunch of ours, but also others, such as runtime_tools, cowboy,
amqp_client, hackney, lager, etc.

Is any of this useful in finding out why it's hanging?

On 2 July 2018 at 16:54, Roger Lipscombe <roger@REDACTED> wrote:
> This sounds similar to
> http://erlang.org/pipermail/erlang-questions/2012-December/071223.html,
> but it's not quite the same, as far as I can tell.
>
> I've noticed that, at some point since upgrading from OTP-20.3 to
> OTP-21.0 (along with the necessary dependency updates), my Erlang
> nodes are no longer stopping at the end of our system test run.
>
> The nodes are orchestrated by having 'erlexec' run a bash script which
> uses (effectively) 'foo/bin/foo foreground &'. I'm relying on erlexec
> killing the bash script and that killing the nodes. This works fine
> when the nodes are using OTP-20.3, but not with OTP-21.0.
>
> If I connect to the node, I can issue 'init:stop()', and it returns
> ok, but nothing happens. If I use 'application:which_applications()',
> I get a timeout.
>
> Unlike the linked discussion, I _can_ repeatedly connect a remote
> shell (using erl -remsh), but I have to resort to erlang:halt() to
> stop the node. Since my system test environment relies on orderly
> process group teardown to stop the nodes, that's not useful.
>
> As I understand it, process group teardown results in SIGTERM, which
> results in a call to init:stop, which should stop the node. It isn't.
>
> How do I figure out why init:stop() isn't working?



More information about the erlang-questions mailing list