[erlang-questions] Distributed application and netsplit
Wed Nov 19 21:49:19 CET 2014
The problem is, once the net split has been detected, how then to manually shutdown the unwanted application instance(s) in such a way that the distributed application failover continues to work from that point forward.
The OP already has application-specific code which is able to detect that a net split / split brain scenario has occurred.
Furthermore, the OP is “ok” writing application specific code to attempt to resolve the situation at that point.
The question is, what is the “best” way to go about doing that?
Is it not possible to shutdown all of the unwanted application instances in such a way that distributed application failover continues to work from that point forward?
Or is it really necessary to restart *all* of the nodes on which the distributed application may run?
Clearly that would be, well, less than ideal.
Then again, so would rewriting dist_ac...
On Nov 19, 2014, at 12:26 PM, Raoul Duke <raould@REDACTED> wrote:
>> keeps running on both nodes after recovery from netsplit. However I
>> doubt it will be considered a bug. Most likely, it is missing feature
>> to be able to fix the situation explicitly.
> The wording continues to worry me. There is not feature that will fix
> the situation in the largest sense. You acknowledge that already so I
> know you know it. :-) So what I would urge us to say is that we want
> there to be a signal after the network heals. Expectation of how soon
> after the healing the signal will get through, however, must be
> carefully constrained.
> I assume what people would do is a "simply" ping constantly. It could
> be a star-network ping, it could be a P2P ping, it could be a
> broadcast ping... Even with this, there are many ways to skin the
> chicken. It will depend on each situation: What the application's
> network is like in the first place. I would hazard to guess.
> erlang-questions mailing list
More information about the erlang-questions