delayed child restart with incremental back-off
Loïc Hoguin
essen@REDACTED
Sun May 2 21:27:28 CEST 2021
I have not looked at the patch, but something like this would be good to
have. Then we could get rid of supervisor2 in RabbitMQ (
https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit_common/src/supervisor2.erl#L15
for the delay part, non-backoff in our case ).
I was going to see if Maria/Jan had interest in providing a patch for
this as well, so I'm glad that there's others showing interest.
Cheers,
On 02/05/2021 21:00, Nicolas Martyanoff wrote:
>
> Hi,
>
> I originally posted this email on erlang-patches, but I just realized
> most developers are on erlang-questions instead. I believe this could be
> of interest.
>
>
> Nine years ago, an interesting patch [1] was submitted by Richard Carlsson
> allowing to delay the re-creation of failed children in supervisors.
>
> After a quick discussions, the official answer was that the OTP team
> would discuss about it [2]. There is no further message on the mailing
> list.
>
> Was there an official response ?
>
> I have various supervisors whose children handle network connections.
> When something goes wrong with the connection, children die and are
> immediately restarted. Most of the times, errors are transient (remote
> server restarting, temporary network issue, etc.), but retrying without
> any delay is pretty much guaranteed to fail again. And of course after a
> few retries, the application dies which is unacceptable.
>
> This kind of behaviour is a huge problem: it fills logs with multiple
> copies of identical errors and causes a system failure.
>
> In general, if I could, I would use restart delays with exponential
> backoff everywhere because in practice, restarting immediately is almost
> never the right approach: code errors do not disappear when restarting
> so they are going to get triggered again immediately, and external errors
> are not magically fixed by retrying without any delay.
>
> Is there still interest for this patch ?
>
> [1] https://erlang.org/pipermail/erlang-patches/2012-January/002575.html
> [2] https://erlang.org/pipermail/erlang-patches/2012-January/002597.html
>
--
Loïc Hoguin
https://ninenines.eu
More information about the erlang-questions
mailing list