[erlang-questions] Distributed Process Registry

Mon Feb 9 08:16:13 CET 2015

> On 09 Feb 2015, at 00:51, Michael Truog <mjtruog@REDACTED> wrote:
> 
> If resolving the separate chunks of data that exist after a netsplit requires user source code, to pick which data is "correct", that can not be consistent and is an arbitrary process (ad-hoc, based on your use-case).  I don't believe that is being partition tolerant, but is instead ignoring the problem of partition tolerance and telling the user: "you should really figure this out”.

But it’s a pathological problem! (see e.g. the Byzantine Generals dilemma). There is no generic way to resolve a netsplit that will lead to a *functionally consistent* system in every case. That is, data consistency in itself is not the end goal, but rather that conflicts are resolved in a way that the system functions as well as possible afterwards.

But saying that a library is inconsistent and arbitrary just because it requires user-level logic is not very helpful. By this reasoning, lists:foldl/3 is arbitrary, since it doesn’t understand how to fold a list in a way that’s ‘right’ for the user every time. If you accept that user intervention is _required_, at least some of the time (see e.g. lists:sort/1 vs lists:sort/2), you need to provide users with the hooks needed to do the job.

>> I agree that it can be a problem in a given system that different components automatically try to resolve an inconsistency using potentially different strategies. For this reason, I’ve long argued that one should have one master arbiter; the other systems need to be able to adapt. Otherwise, the different conflict resolution decisions can actually _cause_ inconsistencies from a system perspective.
> This stance appears to be contradicted by usage of gproc properties. You can have automatic conflict resolution that does not cause inconsistencies, i.e., it does not need to be a manual process that requires a master arbiter.

What I was referring to here was that different parts of a system can end up ‘locally consistent’, but in ways that are not consistent at the system level. For example, if you have a master/slave system where several components independently elect a master, the system will be inconsistent if they end up electing different nodes, _and you expected them to end up with the same master_. Note that whatever is considered wrong is completely a local issue: it might be undesireable if they actually _did_ elect the same master. Either way, if their respective decisions should not be regarded as independent, it is arguably better to have an arbiter decide and tell whoever needs to know.

In a very early version of the AXD 301 cluster controller, I used global to elect a leader instance. When the nodes reconnected after netsplit, the cluster controller would immediately start trying to heal the system, but so did global. At that time, global had only one conflict resolution method: it would randomly pick one of the processes and kill it! It was annoying to say the least, for the cluster controller to try to heal the system while global was gunning for it! In this case, we decided that it would be better for global to simply unregister the name - the cluster controller would then automatically elect a new master.

The work on the AXD 301 cluster controller resulted in several additions to OTP and decisions on how to handle this sort of thing. For one, 'net_kernel dist_auto_connect once’ was introduced to allow the application logic to decide _when_ to reconnect two nodes. Other additions was configurable deconflict methods in global, and the decision to let the user decide how to handle inconsistencies in mnesia. Basically, we needed to be able to _impose_ our view of conflict resolution on OTP, and we were pretty sure that the AXD 301 way of resolving netsplits was not a universal method. But it does not make sense to call it ‘arbitrary’ (even though a perspective, and an interpretation of the word, can be chosen, to make it a valid claim). The system was architected to handle netsplits in a consistent and robust way.

Yes, there are solutions that automatically avoid inconsistencies. One of the design decisions in gproc was to only allow processes to register their own names/properties. This is rather a matter of _conflict avoidance_. Registering a property doesn’t impose any restriction on other processes, like unique name registration does.

BR,
Ulf W

Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
http://feuerlabs.com