[erlang-bugs] Strange application shutdown deadlock

Tim Watson watson.timothy@REDACTED
Fri May 24 16:55:43 CEST 2013


Hi Fred,

On 24 May 2013, at 15:45, Fred Hebert wrote:

> Quick question: are you running a release?
> 
> If so, last time I've seen deadlocks like that was solved by making sure
> *all* my applications did depend on stdlib and kernel in their app file.
> When I skipped them, sometimes I'd find that things would lock up.
> 

No, unfortunately RabbitMQ doesn't run as part of a release.

> My guess was that dependencies from stdlib or kernel got unloaded before
> my app and broke something, but I'm not sure -- In my case, I wasn't
> able to inspect the node as it appeared to be 100% blocked.
> 

I suppose it's possible that that could happen to us, for a different set of apps. I can't see how the release handler would be involved though, since we start our nodes with start_sasl and launch applications by hand...

The code we use to shut applications down explicitly calculates the dependency order itself, so perhaps there's something wrong in there. What we do is essentially this:

stop() ->
    case whereis(rabbit_boot) of
        undefined -> ok;
        _         -> await_startup()
    end,
    rabbit_log:info("Stopping RabbitMQ~n"),
    ok = app_utils:stop_applications(app_shutdown_order()).

stop_and_halt() ->
    try
        stop()
    after
        rabbit_misc:local_info_msg("Halting Erlang VM~n", []),
        init:stop()
    end,
    ok.

app_shutdown_order() ->
    Apps = ?APPS ++ rabbit_plugins:active(),
    app_utils:app_dependency_order(Apps, true).

And that app_utils shutdown order is calculated thus:

app_dependency_order(RootApps, StripUnreachable) ->
    {ok, G} = rabbit_misc:build_acyclic_graph(
                fun (App, _Deps) -> [{App, App}] end,
                fun (App,  Deps) -> [{Dep, App} || Dep <- Deps] end,
                [{App, app_dependencies(App)} ||
                    {App, _Desc, _Vsn} <- application:loaded_applications()]),
    try
        case StripUnreachable of
            true -> digraph:del_vertices(G, digraph:vertices(G) --
                     digraph_utils:reachable(RootApps, G));
            false -> ok
        end,
        digraph_utils:topsort(G)
    after
        true = digraph:delete(G)
    end.

So even if we've shut things down in the wrong order - which I don't think we have - I still don't see where the `get_child' request comes from if the release_handler isn't involved...

Cheers,
Tim





More information about the erlang-bugs mailing list