Divergence in globally registered names

Dániel Szoboszlay dszoboszlay@REDACTED
Tue Oct 13 22:55:16 CEST 2020


What would be a good alternative depends on your use case. If you just want
to be able to address remote processes by name and you don't mind eventual
consistency and having multiple processes registered under the same name,
OTP 23's pg module could work for you. It's got a very nice and clean
implementation that is easy to follow and doesn't seem to have hidden
assumptions and tricky edge cases like global. However, if what you need is
more like leader election (so you always want at most one process to be
registered under a given name), you should rather use some quorum-based
algorithm, like Raft (which already has Erlang implementations).


On Tue, 13 Oct 2020 at 20:31, saket chaudhary <saketcmf@REDACTED> wrote:

> Thanks Daniel for the explanation. The fact that convergence has to be
> forced manually sounds like a deal-breaker for me. What would be a good
> alternative to 'global'?
> On Mon, Oct 12, 2020 at 2:35 PM Dániel Szoboszlay <dszoboszlay@REDACTED>
> wrote:
>> Hi,
>> Global can indeed end up in inconsistent states if some nodes get
>> disconnected from each other (so you're no longer running on a fully
>> connected mesh). Since when registering a global name on node X the change
>> is only propagated to nodes that X are directly connected to, you can end
>> up in a situation that X and Y are connected together, so they will both
>> know about the name, and Y and Z are connected together but X and Z are
>> not, so Z never gets the update.
>> When two nodes (re)connect, they only compare the names they locally know
>> about. So it is a bit tricky, but you can actually end up in a situation
>> when all nodes are connected, yet the global name databases are
>> inconsistent. You will need at least 4 nodes for this scenario to happen
>> (e.g. A, B, C & D):
>>    1. All nodes are connected initially.
>>    2. A gets disconnected from C.
>>    3. A registers process X under some name: this gets propagated to B &
>>    D, but not C.
>>    4. B gets disconnected from D.
>>    5. B re-registers process Y under some name: this gets propagated to
>>    A & C, but not D, so on D the name still belongs to X.
>>    6. A reconnects to C, since they both know the name belongs to Y they
>>    will inform their half of the network about the new node, but won't issue
>>    any global name updates.
>>    7. You have all 4 nodes connected again, but A, B & C believe the
>>    name belongs to Y, while D believes it belongs to X.
>> So this can happen, if you know how global works you can understand how
>> it can happen, but I don't think it would be expected by many people to
>> actually happen. :)
>> global:sync() is not really meant to resolve this error. The only
>> solution I know about is to manually compare global name registrations
>> shortly after you see a new node connecting.
>> Cheers,
>> Daniel
>> On Mon, 12 Oct 2020 at 09:23, saket chaudhary <saketcmf@REDACTED> wrote:
>>> We hit upon an issue in production where two erlang nodes in the same
>>> cluster agreed on the set of neighbour nodes (nodes() call) but diverged on
>>> the globally registered names (global:registered_name()). We're running OTP
>>> 23.0.2, but have hit these issues infrequently in the past with OTP17 as
>>> well.
>>> Calling global:sync() or even net_adm:ping/1 for the remote node that
>>> had the globally registered process didn't help. We verified global
>>> registration of new names was being propagated across all the nodes.
>>> However, it didn't help fix the old names that had diverged. Ultimately, we
>>> had to manually re-register the name using a remote shell.
>>> Does anyone know if this is expected? I thought the erlang nodes would
>>> gossip their way through to resolve any inconsistencies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20201013/b59073b0/attachment.htm>

More information about the erlang-questions mailing list