<div dir="ltr"><div><div><div><div><div>I think brutal_kill is a bad choice for supervisor children. When a supervisor process exits it is very beneficial to know that (except for very exceptional circumstances) every process below it in the supervisor tree has exited. If these processes lower down the supervision have not shutdown, they are now orphaned with no guarantee they will be shutdown/exit.<br>
<br></div>The issue materialises when the child supervisor has a worker that traps exits. The worker might be slow to terminate, perhaps because it has alot of messages queued, makes a long blocking call or has got stuck in a call that will never return due to a bug. When properly terminated by it's parent, the worker might be killed or given a shutdown timeout but this does not come into effect because its parent was killed, so the worker remains alive for an undetermined period of time.<br>
<br></div>If/When the worker's parent is restarted a new version of the worker can not be (re)started if the worker is registered because the old worker is still alive. This could cause the supervisor that brutal kills it's child supervisor to max out it's restart limit and continue to do so up the supervision tree. This could have been avoided if the worker's parent had been allowed to carry out its duty as a supervisor to terminate its children.<br>
<br></div>Of course if the worker is not registered a new version can be restarted. However in this case there could now be two versions of this worker when the application was only designed to have one running at a time. This bug probably never crops up in testing, perhaps because it was assumed the worker's supervisor would prevent this case.<br>
<br></div>Note that everywhere above where I say the child is a worker, I could have used the child type supervisor! This problem could go all the way down the supervision tree.<br><br></div><div>I do not follow the view that different defaults for different types is too complicated, it is one of these least complicated configuration options in a supervisor spec and there are only two types.<br>
</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 1 July 2014 12:06, Loïc Hoguin <span dir="ltr"><<a href="mailto:essen@ninenines.eu" target="_blank">essen@ninenines.eu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Just wanted to say I agree with everything José said, except 3. the different defaults for different process types. It makes things too hard to keep track of.<br>
<br>
I think brutal_kill is a great choice because it makes it easier for the user to stumble upon cold-start issues when restarting their application/release. The VM may crash so there's never a guarantee to run shutdown functions anyway, and too few people forget to test cases when these didn't run, leading to potential production disasters.<div>
<div class="h5"><br>
<br>
On 06/30/2014 04:37 PM, José Valim wrote:<br>
</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">
I have some feedback based on my experience in writing and teaching<br>
Erlang/Elixir.<br>
<br>
1. I would always require the supervision strategy. When I am<br>
prototyping, it is usually hard for me to reason about the intensity and<br>
restarts, but I can always reason about the supervision strategy. All<br>
teaching materials I know of also discuss the supervision right away, so<br>
having it explicit shouldn't hurt those cases.<br>
<br>
For such cases, maybe we should use the format of {{strategy(), map()},<br>
[children()]} (which is similar to what we have today).<br>
<br>
2. I also agree with Richard. Having a default of 0 restarts may cause<br>
confusion and forcing people to define them won't help when they still<br>
have no idea of what values they should use (because it is prototyping /<br>
early deploys). I usually fine tune those values based on the production<br>
system reports so having saner defaults, like Ulf's, may make more sense.<br>
<br>
3. The child_spec defaults are great and similar to the ones we use in<br>
Elixir with the only difference being the shutdown value. We have a<br>
default of 5000, which is the one usually recommended, but we set it to<br>
"infinity" if the type is a "supervisor" (which is the OTP docs<br>
recommendation if I remember correctly).<br>
<br>
Regarding 3., if the recommendation and the consensus in actual systems<br>
is to use brutal_kill for shutdown, I will gladly change Elixir to make<br>
it consistent with OTP defaults.<br>
<br>
Thanks for opening the changes for discussion!<br>
<br></div></div>
*José Valim*<br>
<a href="http://www.plataformatec.com.br" target="_blank">www.plataformatec.com.br</a> <<a href="http://www.plataformatec.com.br/" target="_blank">http://www.plataformatec.com.<u></u>br/</a>><div class=""><br>
Skype: jv.ptec<br>
Founder and Lead Developer<br>
<br>
<br>
<br>
<br></div><div class="">
______________________________<u></u>_________________<br>
erlang-patches mailing list<br>
<a href="mailto:erlang-patches@erlang.org" target="_blank">erlang-patches@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-patches" target="_blank">http://erlang.org/mailman/<u></u>listinfo/erlang-patches</a><br>
<br>
</div></blockquote><span class="HOEnZb"><font color="#888888">
<br>
-- <br>
Loïc Hoguin<br>
<a href="http://ninenines.eu" target="_blank">http://ninenines.eu</a></font></span><div class="HOEnZb"><div class="h5"><br>
______________________________<u></u>_________________<br>
erlang-patches mailing list<br>
<a href="mailto:erlang-patches@erlang.org" target="_blank">erlang-patches@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-patches" target="_blank">http://erlang.org/mailman/<u></u>listinfo/erlang-patches</a><br>
</div></div></blockquote></div><br></div>