EEP proposal - Delayed restarts of supervisor children
Viktor Söderqvist
viktor@REDACTED
Mon Jun 21 18:23:46 CEST 2021
On 2021-06-21 17:21, Fred Hebert wrote:
> At the very least, we should find ways to provide guidance, some
> libraries, demos or samples, or see if there could be a way to create a
> "client" behaviour that could take that common state machine of
> disconnected --> [connecting -->] connected and augment it with backoff
> or even circuit breaker mechanisms ("give the name of the shared circuit
> breaker your clients are using"), which would far more easily let people
> put the fault-handling behaviour close to the error-handling mechanisms
> and bring the decision making to app-specific concerns, and create
> extensible mechanisms.
EEP XXX: New behaviour "gen_client"
Very nice Fred! Feeling up for it?
How do you feel about the pattern where you have a manager process
alongside a supervisor? Connection pools typically have this structure.
Are such manager processes a reasonable place for delay logic you think?
I had this scenario some years ago: There are a few replicas of a
database, which are used for read-only access to offload a master
database. To each of these replicas, you have a connection pool (poolboy
or some other). Each db replica may be down, but it may also just be
lagging behind too much in replication (there's a way to query this) in
which case you don't want to use it until it has caught up.
I used a manager worker process alongside a supervisor of all the pools.
The manager could start/stop the connection pools by adding/removing
them to the supervisor and additionally keep some other data of which
are usable or which aren't. If a replica is down, there's no point in
having all its connection processes stuck in reconnect-loops, so I'd
stop them and remove them from the supervision tree. Any pitfalls with
this design?
A different note regarding automatic reconnects in clients: They may be
problematic, since there may be some state associated with the
connection (such as an ongoing database transaction) which is lost if
automatic reconnect is done without care. Crashing instead of
reconnecting makes this handling way simpler (or at least it moves the
problem to somewhere else). How would you best solve this using the
hypothetical gen_client behaviour?
Viktor
More information about the eeps
mailing list