[erlang-questions] Node not responding to init:stop()

Tue Jul 3 12:32:59 CEST 2018

Hi Roger,

there's been a few changes in the shutdown procedure (init.erl) with the
introduction of Logger in OTP-21. There are also some changes in the
supervision tree of the Kernel application related to Logger. I don't know
if this has anything to do with your problem, but the chances exist, of
course.

As you say, Kernel is the last application to be taken down (it's the last
element in the #state.running of application_controller), and the state you
see is probably stale in the sense that all applications listed before
kernel in the 'running' list should already have been terminated. From the
crash dump - can you see how many processes are still alive? Are they all
part of kernel, or are there more applications that did not completely
terminate?

And if we are to look closer at the logger track - do you know which logger
handlers were active? Are there any children under the logger_sup
supervisor, and if so, what are their state? You mention lager above, do
you know how lager and logger are connected (is lager a handler under
logger, or does it run independently)?

If you see anything suspicious related to logger, please feel free to send
me the crash dump off-list, and I'll see if I can figure out anything more.

Regards
/siri

Den man. 2. jul. 2018 kl. 18:24 skrev Roger Lipscombe <
roger@REDACTED>:

> Right, so I've done some digging. I got a crash dump and by looking at
> the stack trace for application_controller, I can see that it's
> hanging in application_controller:terminate, presumably while waiting
> for one of the processes to stop. The pid given is <0.893.0>. Looking
> a bit further up the stack trace gets me the process state (possibly
> stale?), which contains a bunch of applications and pids. The pid it's
> hanging on is 'kernel', which I'd expect to be the last (?)
> application to be shut down. The process state (assuming it's not
> stale) has a load of other applications in there with pids. There are
> a bunch of ours, but also others, such as runtime_tools, cowboy,
> amqp_client, hackney, lager, etc.
>
> Is any of this useful in finding out why it's hanging?
>
> On 2 July 2018 at 16:54, Roger Lipscombe <roger@REDACTED> wrote:
> > This sounds similar to
> > http://erlang.org/pipermail/erlang-questions/2012-December/071223.html,
> > but it's not quite the same, as far as I can tell.
> >
> > I've noticed that, at some point since upgrading from OTP-20.3 to
> > OTP-21.0 (along with the necessary dependency updates), my Erlang
> > nodes are no longer stopping at the end of our system test run.
> >
> > The nodes are orchestrated by having 'erlexec' run a bash script which
> > uses (effectively) 'foo/bin/foo foreground &'. I'm relying on erlexec
> > killing the bash script and that killing the nodes. This works fine
> > when the nodes are using OTP-20.3, but not with OTP-21.0.
> >
> > If I connect to the node, I can issue 'init:stop()', and it returns
> > ok, but nothing happens. If I use 'application:which_applications()',
> > I get a timeout.
> >
> > Unlike the linked discussion, I _can_ repeatedly connect a remote
> > shell (using erl -remsh), but I have to resort to erlang:halt() to
> > stop the node. Since my system test environment relies on orderly
> > process group teardown to stop the nodes, that's not useful.
> >
> > As I understand it, process group teardown results in SIGTERM, which
> > results in a call to init:stop, which should stop the node. It isn't.
> >
> > How do I figure out why init:stop() isn't working?
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20180703/92cdbe2b/attachment.htm>