Hello Erlangers,<div><br></div><div>I've recently tracked down a bug in some code that seems to be caused by a problem with the supervisor module. What seems to be happening is that when a temporary child has been added to a one_for_all supervisor using start_child, if a permanent child of that supervisor exits then if that permanent child exits, the supervisor attempts to restart the temporary child (which seems counter to the documentation). In addition the MFA for the child has been replaced with undefined and start attempt crashes, causing the supervisor to continue attempting restarts until it reaches it's restart intensity and shuts down.</div>
<div><br></div><div>This does not happen if the temporary child is added to the supervisor as part of the supervisor's initial spec. In this case the temporary child is restart but it's MFA is present so it is able to restart it successfully, although it still seems contrary to the documentation that the temporary child is restarted at all, although it depends on which section of the documentation you regard as more important (from the supervisor documentation):</div>
<div><br></div><div><li>
        <p><span class="code">one_for_all</span> - if one child process terminates and
          should be restarted, all other child processes are terminated
          and then all child processes are restarted.</p><p>This is quite clear: "all child processes are restarted". But later:</p><p></p></li><li>
        <p><span class="code">Restart</span> defines when a terminated child process
          should be restarted. A <span class="code">permanent</span> child process should
          always be restarted, a <span class="code">temporary</span> child process should
          never be restarted and a <span class="code">transient</span> child process
          should be restarted only if it terminates abnormally, i.e.
          with another exit reason than <span class="code">normal</span>.</p></li><p></p></div><div>Again clear but conflicting: "a <span class="code">temporary</span> child process should never be restarted".</div>
<div><br></div><meta http-equiv="content-type" content="text/html; charset=utf-8"><div>I found a similar issue mentioned in a bug report for R14B02, here:</div><div><br></div><div><a href="http://erlang.org/pipermail/erlang-bugs/2011-March/002273.html">http://erlang.org/pipermail/erlang-bugs/2011-March/002273.html</a></div>
<div><br></div><div>But in that case it seemed necessary to call restart_child() which makes it much less of a problem.</div><div><br></div><div>I'm testing on R14B03 and I've produced a small piece of code to replicate the problem:</div>
<div><br></div><div>--- begin ---</div><div><br></div><div><div>-module(bug).</div><div>-behaviour(supervisor).</div><div>-export([test_one/0, test_two/0, spec/2, init/1, main/1]).</div><div><br></div><div>test_one() -></div>
<div><span class="Apple-tab-span" style="white-space:pre">      </span>application:start(sasl),</div><div><span class="Apple-tab-span" style="white-space:pre">     </span>supervisor:start_link({local, sup}, ?MODULE, [spec(foo, permanent)]),</div>
<div><span class="Apple-tab-span" style="white-space:pre">      </span>supervisor:start_child(sup, spec(bar, temporary)),</div><div><span class="Apple-tab-span" style="white-space:pre">   </span>foo ! die.</div><div><br></div><div>
test_two() -></div><div><span class="Apple-tab-span" style="white-space:pre">  </span>application:start(sasl),</div><div><span class="Apple-tab-span" style="white-space:pre">     </span>supervisor:start_link({local, sup}, ?MODULE, [spec(foo, permanent), spec(bar, temporary)]),</div>
<div><span class="Apple-tab-span" style="white-space:pre">      </span>foo ! die.</div><div><br></div><div>spec(Name, Type) -></div><div><span class="Apple-tab-span" style="white-space:pre">     </span>{<span class="Apple-tab-span" style="white-space:pre">   </span>Name,</div>
<div><span class="Apple-tab-span" style="white-space:pre">              </span>{proc_lib, start_link, [?MODULE, main, [Name]]},</div><div><span class="Apple-tab-span" style="white-space:pre">             </span>Type,</div><div><span class="Apple-tab-span" style="white-space:pre">                </span>3000,</div>
<div><span class="Apple-tab-span" style="white-space:pre">              </span>worker,</div><div><span class="Apple-tab-span" style="white-space:pre">              </span>[bug]</div><div><span class="Apple-tab-span" style="white-space:pre">        </span>}.</div>
<div><br></div><div>init(Children) -></div><div><span class="Apple-tab-span" style="white-space:pre">  </span>{ok, {{one_for_all, 3, 10000}, Children}}.</div><div><br></div><div>main(Name) -></div><div><span class="Apple-tab-span" style="white-space:pre">   </span>register(Name, self()),</div>
<div><span class="Apple-tab-span" style="white-space:pre">      </span>proc_lib:init_ack({ok, self()}),</div><div><span class="Apple-tab-span" style="white-space:pre">     </span>receive</div><div><span class="Apple-tab-span" style="white-space:pre">              </span>die -> ok</div>
<div><span class="Apple-tab-span" style="white-space:pre">      </span>end.</div></div><div><br></div><div>--- end ---</div><div><br></div><div>Running test_one from the shell produces this output on my system:</div><div><br></div>
<div><div>$ erl</div><div>Erlang R14B03 (erts-5.8.4) [source] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]</div><div><br></div><div>Eshell V5.8.4  (abort with ^G)</div><div>1> bug:test_one().</div><div>
** exception exit: shutdown</div><div>2></div><div><br></div><div>[snip SASL startup]</div><div><br></div><div>=PROGRESS REPORT==== 3-Aug-2011::11:06:40 ===</div><div>          supervisor: {local,sup}</div></div><div><div>
             started: [{pid,<0.44.0>},</div><div>                       {name,foo},</div><div>                       {mfargs,{proc_lib,start_link,[bug,main,[foo]]}},</div><div>                       {restart_type,permanent},</div>
<div>                       {shutdown,3000},</div><div>                       {child_type,worker}]</div><div><br></div><div>=PROGRESS REPORT==== 3-Aug-2011::11:06:40 ===</div><div>          supervisor: {local,sup}</div><div>
             started: [{pid,<0.45.0>},</div><div>                       {name,bar},</div><div>                       {mfargs,{proc_lib,start_link,[bug,main,[bar]]}},</div><div>                       {restart_type,temporary},</div>
<div>                       {shutdown,3000},</div><div>                       {child_type,worker}]</div><div><br></div><div>=SUPERVISOR REPORT==== 3-Aug-2011::11:06:40 ===</div><div>     Supervisor: {local,sup}</div><div>
     Context:    child_terminated</div><div>     Reason:     normal</div><div>     Offender:   [{pid,<0.44.0>},</div><div>                  {name,foo},</div><div>                  {mfargs,{proc_lib,start_link,[bug,main,[foo]]}},</div>
<div>                  {restart_type,permanent},</div><div>                  {shutdown,3000},</div><div>                  {child_type,worker}]</div><div><br></div><div>=PROGRESS REPORT==== 3-Aug-2011::11:06:40 ===</div><div>
          supervisor: {local,sup}</div><div>             started: [{pid,<0.46.0>},</div><div>                       {name,foo},</div><div>                       {mfargs,{proc_lib,start_link,[bug,main,[foo]]}},</div>
<div>                       {restart_type,permanent},</div><div>                       {shutdown,3000},</div><div>                       {child_type,worker}]</div><div><br></div><div>=SUPERVISOR REPORT==== 3-Aug-2011::11:06:40 ===</div>
<div>     Supervisor: {local,sup}</div><div>     Context:    start_error</div><div>     Reason:     {'EXIT',</div><div>                     {badarg,</div><div>                         [{erlang,apply,[proc_lib,start_link,undefined]},</div>
<div>                          {supervisor,do_start_child,2},</div><div>                          {supervisor,start_children,3},</div><div>                          {supervisor,restart,3},</div><div>                          {supervisor,handle_info,2},</div>
<div>                          {gen_server,handle_msg,5},</div><div>                          {proc_lib,init_p_do_apply,3}]}}</div><div>     Offender:   [{pid,undefined},</div><div>                  {name,bar},</div><div>
                  {mfargs,{proc_lib,start_link,undefined}},</div><div>                  {restart_type,temporary},</div><div>                  {shutdown,3000},</div><div>                  {child_type,worker}]</div><div><br>
</div></div><div>[supervisor loops until it shuts down]</div><div><br></div><div>Running test_two() shows it restarting the temporary child:</div><div><br></div><div><div>2> bug:test_two().</div><div>die</div><div><br>
</div><div>=PROGRESS REPORT==== 3-Aug-2011::11:09:23 ===</div><div>          supervisor: {local,sup}</div><div>             started: [{pid,<0.52.0>},</div><div>                       {name,foo},</div><div>                       {mfargs,{proc_lib,start_link,[bug,main,[foo]]}},</div>
<div>                       {restart_type,permanent},</div><div>                       {shutdown,3000},</div><div>                       {child_type,worker}]</div><div>3> </div><div>=PROGRESS REPORT==== 3-Aug-2011::11:09:23 ===</div>
<div>          supervisor: {local,sup}</div><div>             started: [{pid,<0.53.0>},</div><div>                       {name,bar},</div><div>                       {mfargs,{proc_lib,start_link,[bug,main,[bar]]}},</div>
<div>                       {restart_type,temporary},</div><div>                       {shutdown,3000},</div><div>                       {child_type,worker}]</div><div><br></div><div>=SUPERVISOR REPORT==== 3-Aug-2011::11:09:23 ===</div>
<div>     Supervisor: {local,sup}</div><div>     Context:    child_terminated</div><div>     Reason:     normal</div><div>     Offender:   [{pid,<0.52.0>},</div><div>                  {name,foo},</div><div>                  {mfargs,{proc_lib,start_link,[bug,main,[foo]]}},</div>
<div>                  {restart_type,permanent},</div><div>                  {shutdown,3000},</div><div>                  {child_type,worker}]</div><div><br></div><div><br></div><div>=PROGRESS REPORT==== 3-Aug-2011::11:09:23 ===</div>
<div>          supervisor: {local,sup}</div><div>             started: [{pid,<0.54.0>},</div><div>                       {name,foo},</div><div>                       {mfargs,{proc_lib,start_link,[bug,main,[foo]]}},</div>
<div>                       {restart_type,permanent},</div><div>                       {shutdown,3000},</div><div>                       {child_type,worker}]</div><div><br></div><div>=PROGRESS REPORT==== 3-Aug-2011::11:09:23 ===</div>
<div>          supervisor: {local,sup}</div><div>             started: [{pid,<0.55.0>},</div><div>                       {name,bar},</div><div>                       {mfargs,{proc_lib,start_link,[bug,main,[bar]]}},</div>
<div>                       {restart_type,temporary},</div><div>                       {shutdown,3000},</div><div>                       {child_type,worker}]</div><div><br></div></div><div><br></div><div>Am I doing something wrong or is this actually a bug (or bugs)?</div>
<div><br></div><div>Peace,</div><div>Sam.</div>