delayed child restart with incremental back-off

Michael Truog mjtruog@REDACTED
Tue May 4 09:12:06 CEST 2021


On 5/3/21 11:15 PM, Nicolas Martyanoff wrote:
> zxq9 <zxq9@REDACTED> writes:
>
>> You don't have to implement your own supervisor to get this kind of behavior,
>> simply move connection out of initialization. As a general rule initialization
>> should never be dependent on anything outside your node's control --
>> especially not something across the network.
> I do not know why there is such a focus on initialization. Errors can
> occurs during the entire lifecycle of a process; it is common to end up
> in a situation where a worker will fail *after* initialization, and this
> failure will repeat due to external consequences or to a coding mistake.
> In that situation, initialization tricks will not help you: the process
> will crash N times in a row, filling the logs with duplicate error
> messages, then the entire program will die. This is not acceptable for a
> server.
>
The reason is due to initialization being a short period of time that 
can have a timeout value to limit the execution (and being the 
precondition for all later execution).  It is better to have something 
fail during initialization when compared to 5 days later. If a failure 
after x days is difficult to replicate, you still don't want to wait 
that length of time to test. That is why it is best to validate 
everything during initialization to ensure the undefined runtime length 
after initialization is valid.  Otherwise you are just wasting 
development time when bugs occur.

Best Regards,
Michael




More information about the erlang-questions mailing list