BUG: fatal interaction between application:ensure_all_started(A) and permit(B, false)

Ulf Wiger ulf@REDACTED
Mon Mar 22 18:06:43 CET 2021

Hmm, trying some more with OTP 24, it addresses the problem with the memory
growth, but still isn't permission-aware.

Consider test apps a and b, where a depends on b.

15> application:permit(b,false).
16> application:ensure_all_started(a).
17> application:which_applications().
[{stdlib,"ERTS  CXC 138 10","3.15"},
 {kernel,"ERTS  CXC 138 10","8.0"}]
18> application:permit(b,true).
19> application:which_applications().
[{b,"test app","0.1"},
 {stdlib,"ERTS  CXC 138 10","3.15"},
 {kernel,"ERTS  CXC 138 10","8.0"}]

The call to application:ensure_all_started(a) fails, and supposedly all
child apps that were started will have been stopped again, and it does look
that way.

But if we later permit b to run, it turns out that the start request wasn't
actually removed, and b pops up.

This is for sure a much less serious problem than the previous one.

However, I'm not sure if returning error is actually the right thing to do
there. The call SHOULD probably hang.


Ulf W

On Mon, Mar 22, 2021 at 1:23 PM Ulf Wiger <ulf@REDACTED> wrote:

> When I started looking closer into this, it would appear as if there is a
> long-standing bug in the application_controller regarding permissions.
> And with "long-standing" I mean that it was there even when Kostis did
> some Tidier-based cleanup 11 years ago. Kostis didn't introduce it, though.
> When servicing a start request, the application_controller, if
> permission(App) == false, adds a new entry to the `start_p_false` list,
> i.e. a new entry for each request.
> https://github.com/erlang/otp/blob/master/lib/kernel/src/application_controller.erl#L689-L690
> ... but when servicing a subsequent {permit_application, App, true}, it
> uses lists:keydelete/3 to remove the App from the `start_p_false` list.
> https://github.com/erlang/otp/blob/master/lib/kernel/src/application_controller.erl#L759-L761
> lists:keydelete/3 obviously only removes the first matching entry.
> Earlier in that function, it also only locates the first pending request
> (or rather, chronologically the last), and uses the `From` in
> `spawn_starter()'.
> The rest of the pending requests should be handled somewhere - likely in
> `handle_application_started/3`, but aren't.
> BR,
> Ulf W
> On Sat, Mar 20, 2021 at 1:45 PM Mikael Pettersson <mikpelinux@REDACTED>
> wrote:
>> On Sat, Mar 20, 2021 at 8:46 AM Ulf Wiger <ulf@REDACTED> wrote:
>> >
>> > I had the brilliant idea of using application permissions for a
>> particular use case. This seemed to work perfectly, until I ran `rebar3
>> shell`, and spotted some disturbing behavior.
>> >
>> > The bug, apparently, lies in that `application:ensure_all_started(A)`
>> ends up busy-looping if A depends on B, and permission(B) -> false. What's
>> worse, for each call to start(B), the application controller notices the
>> permission flag, returns `ok` and inserts an entry in its internal
>> `start_p_false` list. This amounts to a memory leak.
>> >
>> > I commented it in a tweet, then decided to try to find the source, esp.
>> since I suspected `application:ensure_all_started/1`.
>> >
>> > https://twitter.com/uwiger/status/1372944356781531136
>> >
>> > In short, if permission(B) -> false, what happens is:
>> > start(A) -> {error, {not_started, B}}
>> > start(B) -> ok
>> > start(A) -> {error,  {not_started, B}}
>> > ... [repeat endlessly]
>> >
>> > Now, it could be fixed by adding a permission check in the looping
>> function, but this raises the question of what should happen in the above
>> case. Three alternatives:
>> >
>> > 1. ensure_all_started(A) returns {error, {not_permitted, B}}, or
>> something
>> > 2. the call hangs until the flag(s) change, but start(B) is only called
>> once.
>> > 3. Warn against the use of permissions in the docs, and deprecate them.
>> >
>> > I'm assuming that most of you may not even know about permissions. They
>> were introduced back in 1996-97 (I believe), when I and Martin Björklund
>> were going back and forth on how to support distributed applications and
>> cluster control. Eventually, this led to dist_ac and the protocol being
>> defined, so that users could write a controller app taking control of an
>> application and giving instructions on where it should run. In the AXD301,
>> this was done by the RCM application. I believe I talked about it at the
>> EUC 1997, but it's hard to find information about that on the web. :)
>> >
>> > Anyway, permissions were left in the API, and ARE documented.
>> >
>> > Thoughts?
>> I know we've used the permissions mechanism occasionally during
>> maintenance or live upgrades. Off-hand I don't know if we'd want
>> alternative 1 or 2 (my colleague Daniel Szoboszlay might know more
>> about this).
>> /Mikael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20210322/710b6462/attachment.htm>

More information about the erlang-questions mailing list