[erlang-questions] Distributed application and netsplit

Karolis Petrauskas k.petrauskas@REDACTED
Wed Nov 19 07:41:58 CET 2014


I agree, that the recovery can be application specific. The problem
here is that there is no way to resolve the netsplit even manually.

In my case, I use the global process registry and therefore I am
getting notifications in the case of recovery from netsplit. On that
event the resolve function (see global:register_name/3) gets pids from
both nodes. From them I can determine nodes, where the processes are
running. Then I can determine, which of the nodes should keep running
and which should be stopped. The only problem in my case is that there
is no way to stop the secondary node explicitly.

Imants, your suggestions imply rewriting a large portion of dist_ac
(http://www.erlang.org/doc/design_principles/distributed_applications.html).
I consider that as the plan B.

I could make the minimal test case to reproduce my problem. But I am
not sure, what to show there? I can show, that distributed application
keeps running on both nodes after recovery from netsplit. However I
doubt it will be considered a bug. Most likely, it is missing feature
to be able to fix the situation explicitly.

Karolis

On Wed, Nov 19, 2014 at 3:02 AM, Imants Cekusins <imantc@REDACTED> wrote:
> does monitor_nodes not keep subscribers up-to-date?
>
> The proxy apps could store a list of nodes in order of priority. Each app
> instance but the one running in the top priority active node would stop
> their worker.
>
> ?
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>



More information about the erlang-questions mailing list