[erlang-questions] [ANN] Syn: a global process registry

Roberto Ostinelli roberto@REDACTED
Tue Jul 7 16:25:31 CEST 2015


Hi Fred,
Thank you for your input. Comments below.



> One of the things mentioned in your article was that because you used
> mostly unique device names, you didn't have to worry much about conflicts
> in names, and could consequently relax the consistency properties to go for
> eventual consistency.
>
> There is however no details about how this takes place. Attributes that
> are fun to know are:
>
> - What's the conflict resolution mechanism
> - how long does it take to detect a conflict
> - how long does it take to resolve a conflict
>
> For example, I looked at the following code:
> https://github.com/ostinelli/syn/blob/master/src/syn_consistency.erl#L255-L262
>
>    case CallbackModule of
>        undefined ->
>            error_logger:warning_msg("Found a double process for ~s,
> killing it on local node ~p", [Key, node()]),
>            exit(LocalProcessPid, kill);
>        _ -> spawn(fun() ->
>            error_logger:warning_msg("Found a double process for ~s, about
> to trigger callback on local node ~p", [Key, node()]),
>            CallbackModule:CallbackFunction(Key, LocalProcessPid) end)
>    end
>
> And this makes it look like it is possible for two nodes to find
> conflicting pids, and if they find it at the same time, both processes are
> killed at once. This can be worked-around by setting up a function that
> always picks the same pid no matter who executes it (exit(max(P1,P2),
> kill), for example), but killing the local pid always risks having all
> nodes involved making that same decision and then having nobody left as
> soon as there's a conflict.
>


When a node is disconnected from the cluster, the other nodes will remove
from their mnesia tables all the pids (and hence the keys) that run on the
disconnected node, and viceversa:
https://github.com/ostinelli/syn/blob/master/src/syn_consistency.erl#L134

This means that the disconnected node *does not* have in its
mnesia replica the keys of all the other nodes, and the other nodes *do
not* have in their mnesia replicas the keys of the disconnected node.

If the disconnected node was to merge back in right away (i.e. with no new
registrations happening), there simply wouldn't be any conflicts and
everything would be merged in.

In a more realistic scenario, the nodes of the cluster and the disconnected
node keep registering new pids.
If, during the net split, there's no unique key that has been used both on
the disconnected node and on the rest of the cluster, then we're back to
the previous scenario: everything gets merged in.
If the same unique key has been registered both on the disconnected node
and on the cluster, then we have a conflict.

In this case, if you scroll a little above in the code, you'll see that at
that all of the merge code runs inside of a global lock:
https://github.com/ostinelli/syn/blob/master/src/syn_consistency.erl#L180

When one node starts the merge, the other nodes are basically waiting. The
risk of having both killed is therefore non-existent. Or, I might have
forgotten something (it happens!), in which way I'd be delighted to know
and improve the code :)

Just to give you an example of what I've been observing: 2 nodes, 1 million
connected (and registered) devices, a net split of 5 minutes, less than 10
conflicts, resolved in less than 500ms from the moments mnesia signalled an
inconsistent database, to the moment the global lock is released).


So what could be the impact of this on a cluster where the conflict rate is
> higher, say 80%? Would an app like Syn mostly kill my entire cluster if I
> don't configure it properly? Or maybe I misunderstood something from my
> very brief reading of the code.
>

Please consider that as per the use-case defined (IoT applications),
conflicts are extremely minor.
Your example would mean that 80% of the devices, during a net split,
connected both to the disconnected node and the rest of the cluster. It is
weird to say the least.

That being said, I have not benchmarked this case scenario, but here again
we are talking about finding the conflicting keys, and sending an exit
signal to 1 of the 2 conflicting pids:
https://github.com/ostinelli/syn/blob/master/src/syn_consistency.erl#L238

These things are rather quick in the 7 digit numbers.



> The speed boost is interesting, but without more details about the app's
> handling of conflict when the uniqueness of names isn't guaranteed, it's
> hard to make myself a solid idea of how it would go in the wild.
>

If you mean uniqueness of names in a precise given time, indeed. Syn is
eventually consistent.


Best,
r.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150707/a38ac8e2/attachment.htm>


More information about the erlang-questions mailing list