<div dir="ltr">Hmm, trying some more with OTP 24, it addresses the problem with the memory growth, but still isn't permission-aware.<div><br></div><div>Consider test apps a and b, where a depends on b.</div><div><br></div><div>15> application:permit(b,false).<br>ok<br>16> application:ensure_all_started(a).<br>{error,{a,{not_started,b}}}<br>17> application:which_applications().<br>[{stdlib,"ERTS CXC 138 10","3.15"},<br> {kernel,"ERTS CXC 138 10","8.0"}]<br>18> application:permit(b,true).<br>ok<br>19> application:which_applications().<br>[{b,"test app","0.1"},<br> {stdlib,"ERTS CXC 138 10","3.15"},<br> {kernel,"ERTS CXC 138 10","8.0"}]<br></div><div><br></div><div>The call to application:ensure_all_started(a) fails, and supposedly all child apps that were started will have been stopped again, and it does look that way.</div><div><br></div><div>But if we later permit b to run, it turns out that the start request wasn't actually removed, and b pops up.</div><div><br></div><div>This is for sure a much less serious problem than the previous one.</div><div><br></div><div>However, I'm not sure if returning error is actually the right thing to do there. The call SHOULD probably hang.</div><div><br></div><div>Comments?</div><div><br></div><div>BR,</div><div>Ulf W</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Mar 22, 2021 at 1:23 PM Ulf Wiger <<a href="mailto:ulf@wiger.net">ulf@wiger.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">When I started looking closer into this, it would appear as if there is a long-standing bug in the application_controller regarding permissions.<div><br></div><div>And with "long-standing" I mean that it was there even when Kostis did some Tidier-based cleanup 11 years ago. Kostis didn't introduce it, though.</div><div><br></div><div>When servicing a start request, the application_controller, if permission(App) == false, adds a new entry to the `start_p_false` list, i.e. a new entry for each request.</div><div><a href="https://github.com/erlang/otp/blob/master/lib/kernel/src/application_controller.erl#L689-L690" target="_blank">https://github.com/erlang/otp/blob/master/lib/kernel/src/application_controller.erl#L689-L690</a><br></div><div><br></div><div>... but when servicing a subsequent {permit_application, App, true}, it uses lists:keydelete/3 to remove the App from the `start_p_false` list.</div><div><a href="https://github.com/erlang/otp/blob/master/lib/kernel/src/application_controller.erl#L759-L761" target="_blank">https://github.com/erlang/otp/blob/master/lib/kernel/src/application_controller.erl#L759-L761</a><br></div><div><br></div><div>lists:keydelete/3 obviously only removes the first matching entry.</div><div><br></div><div>Earlier in that function, it also only locates the first pending request (or rather, chronologically the last), and uses the `From` in `spawn_starter()'.</div><div><br></div><div>The rest of the pending requests should be handled somewhere - likely in `handle_application_started/3`, but aren't.</div><div><br></div><div>BR,</div><div>Ulf W</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Mar 20, 2021 at 1:45 PM Mikael Pettersson <<a href="mailto:mikpelinux@gmail.com" target="_blank">mikpelinux@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Sat, Mar 20, 2021 at 8:46 AM Ulf Wiger <<a href="mailto:ulf@wiger.net" target="_blank">ulf@wiger.net</a>> wrote:<br>
><br>
> I had the brilliant idea of using application permissions for a particular use case. This seemed to work perfectly, until I ran `rebar3 shell`, and spotted some disturbing behavior.<br>
><br>
> The bug, apparently, lies in that `application:ensure_all_started(A)` ends up busy-looping if A depends on B, and permission(B) -> false. What's worse, for each call to start(B), the application controller notices the permission flag, returns `ok` and inserts an entry in its internal `start_p_false` list. This amounts to a memory leak.<br>
><br>
> I commented it in a tweet, then decided to try to find the source, esp. since I suspected `application:ensure_all_started/1`.<br>
><br>
> <a href="https://twitter.com/uwiger/status/1372944356781531136" rel="noreferrer" target="_blank">https://twitter.com/uwiger/status/1372944356781531136</a><br>
><br>
> In short, if permission(B) -> false, what happens is:<br>
> start(A) -> {error, {not_started, B}}<br>
> start(B) -> ok<br>
> start(A) -> {error, {not_started, B}}<br>
> ... [repeat endlessly]<br>
><br>
> Now, it could be fixed by adding a permission check in the looping function, but this raises the question of what should happen in the above case. Three alternatives:<br>
><br>
> 1. ensure_all_started(A) returns {error, {not_permitted, B}}, or something<br>
> 2. the call hangs until the flag(s) change, but start(B) is only called once.<br>
> 3. Warn against the use of permissions in the docs, and deprecate them.<br>
><br>
> I'm assuming that most of you may not even know about permissions. They were introduced back in 1996-97 (I believe), when I and Martin Björklund were going back and forth on how to support distributed applications and cluster control. Eventually, this led to dist_ac and the protocol being defined, so that users could write a controller app taking control of an application and giving instructions on where it should run. In the AXD301, this was done by the RCM application. I believe I talked about it at the EUC 1997, but it's hard to find information about that on the web. :)<br>
><br>
> Anyway, permissions were left in the API, and ARE documented.<br>
><br>
> Thoughts?<br>
<br>
I know we've used the permissions mechanism occasionally during<br>
maintenance or live upgrades. Off-hand I don't know if we'd want<br>
alternative 1 or 2 (my colleague Daniel Szoboszlay might know more<br>
about this).<br>
<br>
/Mikael<br>
</blockquote></div>
</blockquote></div>