erlang cluster partitioned

Dániel Szoboszlay dszoboszlay@REDACTED
Tue Nov 30 14:40:11 CET 2021


You can create a process that calls erlang:monitor_node(Node, true) for all
nodes expected to be in the cluster. Then whenever you receive a {nodedown,
Node} in that process you can monitor_node the failed node after some
cooldown time. Since monitor_node will attempt to connect to the node if
it's not already connected, this would be enough to restore failed
connections.

Cheers,
Daniel

On Tue, 30 Nov 2021 at 06:57, saket chaudhary <saketcmf@REDACTED> wrote:

> There're no firewalls to speak of. Things do work as is all the time
> except when we hear of network activity with router or switch upgrades in
> some parts that we've got no control of. But our app needs to be resilient
> to that. Things also work when the entire cluster gets restarted.
>
> What must be done to make sure we have a fully formed mesh that can
> withstand temporary disruptions and heal itself eventually? Should I write
> something that ensures every node pings every other node in the cluster
> that's statically configured?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20211130/648b814e/attachment.htm>


More information about the erlang-questions mailing list