<div dir="ltr">What happened at the time was that I met up with the OTP team and discussed it, and they eventually agreed that this was a good thing. However, it needed more work to be accepted (and I realized a couple of weaknesses in the implementation that I needed to address), but I never found time to do more work on it.<div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><br> /Richard</div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Den sön 2 maj 2021 kl 21:01 skrev Nicolas Martyanoff <<a href="mailto:khaelin@gmail.com">khaelin@gmail.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
Hi,<br>
<br>
I originally posted this email on erlang-patches, but I just realized<br>
most developers are on erlang-questions instead. I believe this could be<br>
of interest.<br>
<br>
<br>
Nine years ago, an interesting patch [1] was submitted by Richard Carlsson<br>
allowing to delay the re-creation of failed children in supervisors.<br>
<br>
After a quick discussions, the official answer was that the OTP team<br>
would discuss about it [2]. There is no further message on the mailing<br>
list.<br>
<br>
Was there an official response ?<br>
<br>
I have various supervisors whose children handle network connections.<br>
When something goes wrong with the connection, children die and are<br>
immediately restarted. Most of the times, errors are transient (remote<br>
server restarting, temporary network issue, etc.), but retrying without<br>
any delay is pretty much guaranteed to fail again. And of course after a<br>
few retries, the application dies which is unacceptable.<br>
<br>
This kind of behaviour is a huge problem: it fills logs with multiple<br>
copies of identical errors and causes a system failure.<br>
<br>
In general, if I could, I would use restart delays with exponential<br>
backoff everywhere because in practice, restarting immediately is almost<br>
never the right approach: code errors do not disappear when restarting<br>
so they are going to get triggered again immediately, and external errors<br>
are not magically fixed by retrying without any delay.<br>
<br>
Is there still interest for this patch ?<br>
<br>
[1] <a href="https://erlang.org/pipermail/erlang-patches/2012-January/002575.html" rel="noreferrer" target="_blank">https://erlang.org/pipermail/erlang-patches/2012-January/002575.html</a><br>
[2] <a href="https://erlang.org/pipermail/erlang-patches/2012-January/002597.html" rel="noreferrer" target="_blank">https://erlang.org/pipermail/erlang-patches/2012-January/002597.html</a><br>
<br>
-- <br>
Nicolas Martyanoff<br>
<a href="http://snowsyn.net" rel="noreferrer" target="_blank">http://snowsyn.net</a><br>
<a href="mailto:khaelin@gmail.com" target="_blank">khaelin@gmail.com</a><br>
</blockquote></div>