<div dir="ltr"><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jun 21, 2021 at 6:15 AM Maria Scott <<a href="mailto:maria-12648430@hnc-agency.org">maria-12648430@hnc-agency.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">When I try to put myself in the shoes of a user making such a request, what likely brings me to it is the fact that I have a rather simple problem: restarts happen too fast for the external resource my client depends upon. The solution seems also simple, if I could just make a supervisor wait a bit between restarts. How hard can it be to just wait? And, from the point of view of my specific, limited use case, I'm probably right, simple waiting would help, and no harm done.<br>

<br>

But now I'm told that instead I have to do it in a comparatively complicated way, that instead of having a knob to turn restart delays up and down as needed in the few spots where I need it, I have to implement backoff/retry logic etc myself in my client. If my client is really a 3rd party thingie that doesn't provide any backoff/retry logic itself, I'll probably have to write a wrapper around it, too. And do it all over again in other similar clients of mine, where otherwise I would just use the same knob, if only it was there!<br>

The IMO crucial point: I won't get my simple knob _for reasons that don't apply to me and my simple, limited use case_.<br>

<br>

With that in mind, I think the frustration resulting from such requests is at least understandable. So I wonder if we can't do anything for them. Without touching the supervisor, or another central component for that matter ;)<br>

<br></blockquote><div><br></div><div>Yeah, I definitely agree with this. The experience is currently not great, and to me is very reminiscent of people asking "how do I make sure I can deal with one of my servers going down" and being pointed to a bunch of distributed system academic papers, intro blog posts to the CAP theorem, and unfortunately being offered nearly no solutions aside from "learn a lot of theory to make the right decisions for your application."</div><div><br></div><div>At the very least, we should find ways to provide guidance, some libraries, demos or samples, or see if there could be a way to create a "client" behaviour that could take that common state machine of disconnected --> [connecting -->] connected and augment it with backoff or even circuit breaker mechanisms ("give the name of the shared circuit breaker your clients are using"), which would far more easily let people put the fault-handling behaviour close to the error-handling mechanisms and bring the decision making to app-specific concerns, and create extensible mechanisms.</div><div><br></div><div>This could create a blessed path with safe defaults and obvious points for literature and ways to dig into fancier approaches, which would also reduce the amount of feature interplay and interleaving that lowers the ceiling of what you can currently put in a supervisor before there's way too much to consider.<br></div><div><br></div><div><br></div></div></div>