BUG: fatal interaction between application:ensure_all_started(A) and permit(B, false)

Mon Mar 22 13:23:54 CET 2021

When I started looking closer into this, it would appear as if there is a
long-standing bug in the application_controller regarding permissions.

And with "long-standing" I mean that it was there even when Kostis did some
Tidier-based cleanup 11 years ago. Kostis didn't introduce it, though.

When servicing a start request, the application_controller, if
permission(App) == false, adds a new entry to the `start_p_false` list,
i.e. a new entry for each request.
https://github.com/erlang/otp/blob/master/lib/kernel/src/application_controller.erl#L689-L690

... but when servicing a subsequent {permit_application, App, true}, it
uses lists:keydelete/3 to remove the App from the `start_p_false` list.
https://github.com/erlang/otp/blob/master/lib/kernel/src/application_controller.erl#L759-L761

lists:keydelete/3 obviously only removes the first matching entry.

Earlier in that function, it also only locates the first pending request
(or rather, chronologically the last), and uses the `From` in
`spawn_starter()'.

The rest of the pending requests should be handled somewhere - likely in
`handle_application_started/3`, but aren't.

BR,
Ulf W

On Sat, Mar 20, 2021 at 1:45 PM Mikael Pettersson <mikpelinux@REDACTED>
wrote:

> On Sat, Mar 20, 2021 at 8:46 AM Ulf Wiger <ulf@REDACTED> wrote:
> >
> > I had the brilliant idea of using application permissions for a
> particular use case. This seemed to work perfectly, until I ran `rebar3
> shell`, and spotted some disturbing behavior.
> >
> > The bug, apparently, lies in that `application:ensure_all_started(A)`
> ends up busy-looping if A depends on B, and permission(B) -> false. What's
> worse, for each call to start(B), the application controller notices the
> permission flag, returns `ok` and inserts an entry in its internal
> `start_p_false` list. This amounts to a memory leak.
> >
> > I commented it in a tweet, then decided to try to find the source, esp.
> since I suspected `application:ensure_all_started/1`.
> >
> > https://twitter.com/uwiger/status/1372944356781531136
> >
> > In short, if permission(B) -> false, what happens is:
> > start(A) -> {error, {not_started, B}}
> > start(B) -> ok
> > start(A) -> {error,  {not_started, B}}
> > ... [repeat endlessly]
> >
> > Now, it could be fixed by adding a permission check in the looping
> function, but this raises the question of what should happen in the above
> case. Three alternatives:
> >
> > 1. ensure_all_started(A) returns {error, {not_permitted, B}}, or
> something
> > 2. the call hangs until the flag(s) change, but start(B) is only called
> once.
> > 3. Warn against the use of permissions in the docs, and deprecate them.
> >
> > I'm assuming that most of you may not even know about permissions. They
> were introduced back in 1996-97 (I believe), when I and Martin Björklund
> were going back and forth on how to support distributed applications and
> cluster control. Eventually, this led to dist_ac and the protocol being
> defined, so that users could write a controller app taking control of an
> application and giving instructions on where it should run. In the AXD301,
> this was done by the RCM application. I believe I talked about it at the
> EUC 1997, but it's hard to find information about that on the web. :)
> >
> > Anyway, permissions were left in the API, and ARE documented.
> >
> > Thoughts?
>
> I know we've used the permissions mechanism occasionally during
> maintenance or live upgrades. Off-hand I don't know if we'd want
> alternative 1 or 2 (my colleague Daniel Szoboszlay might know more
> about this).
>
> /Mikael
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20210322/104f7c3b/attachment.htm>