Fwd: Re: EEP proposal - Delayed restarts of supervisor children

Thu Jun 17 16:39:01 CEST 2021

(forwarding/resending so everybody can follow ^^;)

> Hi Fred,
> 
> > I am against this proposal, for similar reasons I have opposed similar ones in the past. Most of my opposition has been written up before so I'm just going to link to it here: https://ferd.ca/it-s-about-the-guarantees.html
> 
> Not too unexpected, I know the article =^^= And while I agree with you in principle and to a large degree, I also see benefits in delayed restarts and can understand the people wanting them.
> 
> As Loic (sorry, no idea how you put those double dots over an i ^^;;;) pointed out, just because a feature exists does not mean that it must be used, or that it is a good fit for everything, or that it has to be the one way by which all things have to be done. The feature takes away nothing.
> Myself, I would go for doing things in the workers as you suggest if I reasonably can, but I would equally well go for delayed restarts if I see no reason against doing so.
> 
> Now I have to admit that this opens up new ways to f*ck things up, certainly. You point some of them out below. But such is, give or take, any feature when used carelessly. And IMO those are special cases where you decidedly would _not_ want to use delayed restarts.
> 
> Also, I presume that delayed restarts are really most useful and the primary field of use cases in the one_for_one strategies. TBH, figuring something out for one_for_all and rest_for_one was the biggest headache in all this, and we would actually have been glad if we could have ignored them. Nevertheless, they are there.
> 
> > I dislike the possibility of running vs. active children because arguably the caller would need to have a way to check for that and it would be absolutely terrible to have to ask the supervisor on every hot call path; that semantic distinction should IMO be implemented in the worker, as per my post above.
> 
> Sorry, I don't understand that one ^^; On what kind of calls would you want to ask the supervisor if a child is running or not? 
> 
> > I'm also not sure of using the max delay of all children to assign it when it applies to many. This makes sense as a preventive measure to be too aggressive, but actively prevents being able to consider distinct tasks as actually distinct.
> 
> Sure, but it ensures that tasks (or parts thereof) which _do_ belong together _stay_ (in the sense of, are started) together.
> 
> > Take the following example where you have a configuration handler starting with a very short backoff, in a one_for_all situation with a database client that relies on the configuration handler to start. The configuration handler may also be used by other workers for information and you want it available. This is set as a one_for_all configuration such that if the client to the database goes down, the whole ensemble restarts to provide a fresh DB config (in case the config changed!)
> > With these retry policies, the ability to provide a fresh config is now limited by the delay of the client wanting to retry connecting to the database, even though they refer to fundamentally different operations with different load profiles deserving different timers. The end result is that you'd end up having to split your supervision tree to make sure the timeouts in one child do not affect other ones. That's messy.
> 
> I understand, but well, in circumstances where delayed restarts are no good and could even do harm, you should just not use them. You don't _have_ to ^^;
> 
> > If anything of this proposal goes through I would argue in favor of not supporting incremental backoffs because this is either guaranteeing you're gonna get a basic, subpar implementation that still needs replacing with nicer libraries in use cases that need some refinement (which might be worth embracing for simplicity's sake), or sending you on the way of having supervisors which have most of their logic dedicated to not actually being supervisors but to actually doing good circuit breaking or exponential backoff with triggers (and no coordination) if you want to provide a more solid implementation.
> 
> I agree. I want to keep incremental backoffs out of this EEP, for just the reasons you pointed out.
> 
> Kind regards,
> Maria