EEP proposal - Delayed restarts of supervisor children

Fred Hebert mononcqc@REDACTED
Mon Jun 21 17:21:10 CEST 2021


On Mon, Jun 21, 2021 at 6:15 AM Maria Scott <maria-12648430@REDACTED>
wrote:

> When I try to put myself in the shoes of a user making such a request,
> what likely brings me to it is the fact that I have a rather simple
> problem: restarts happen too fast for the external resource my client
> depends upon. The solution seems also simple, if I could just make a
> supervisor wait a bit between restarts. How hard can it be to just wait?
> And, from the point of view of my specific, limited use case, I'm probably
> right, simple waiting would help, and no harm done.
>
> But now I'm told that instead I have to do it in a comparatively
> complicated way, that instead of having a knob to turn restart delays up
> and down as needed in the few spots where I need it, I have to implement
> backoff/retry logic etc myself in my client. If my client is really a 3rd
> party thingie that doesn't provide any backoff/retry logic itself, I'll
> probably have to write a wrapper around it, too. And do it all over again
> in other similar clients of mine, where otherwise I would just use the same
> knob, if only it was there!
> The IMO crucial point: I won't get my simple knob _for reasons that don't
> apply to me and my simple, limited use case_.
>
> With that in mind, I think the frustration resulting from such requests is
> at least understandable. So I wonder if we can't do anything for them.
> Without touching the supervisor, or another central component for that
> matter ;)
>
>
Yeah, I definitely agree with this. The experience is currently not great,
and to me is very reminiscent of people asking "how do I make sure I can
deal with one of my servers going down" and being pointed to a bunch of
distributed system academic papers, intro blog posts to the CAP theorem,
and unfortunately being offered nearly no solutions aside from "learn a lot
of theory to make the right decisions for your application."

At the very least, we should find ways to provide guidance, some libraries,
demos or samples, or see if there could be a way to create a "client"
behaviour that could take that common state machine of disconnected -->
[connecting -->] connected and augment it with backoff or even circuit
breaker mechanisms ("give the name of the shared circuit breaker your
clients are using"), which would far more easily let people put the
fault-handling behaviour close to the error-handling mechanisms and bring
the decision making to app-specific concerns, and create extensible
mechanisms.

This could create a blessed path with safe defaults and obvious points for
literature and ways to dig into fancier approaches, which would also reduce
the amount of feature interplay and interleaving that lowers the ceiling of
what you can currently put in a supervisor before there's way too much to
consider.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/eeps/attachments/20210621/1cc06679/attachment.htm>


More information about the eeps mailing list