<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 07/07/2015 07:25 AM, Roberto
Ostinelli wrote:<br>
</div>
<blockquote
cite="mid:CAM5fRyrLRJwLC5KkFMj-hzv0uQRhAfsOQ1vqwk0FW3ZSE6JNeg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>Hi Fred,</div>
<div>Thank you for your input. Comments below.</div>
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">One
of the things mentioned in your article was that because
you used mostly unique device names, you didn't have to
worry much about conflicts in names, and could
consequently relax the consistency properties to go for
eventual consistency.<br>
<br>
There is however no details about how this takes place.
Attributes that are fun to know are:<br>
<br>
- What's the conflict resolution mechanism<br>
- how long does it take to detect a conflict<br>
- how long does it take to resolve a conflict<br>
<br>
For example, I looked at the following code: <a
moz-do-not-send="true"
href="https://github.com/ostinelli/syn/blob/master/src/syn_consistency.erl#L255-L262"
rel="noreferrer" target="_blank">https://github.com/ostinelli/syn/blob/master/src/syn_consistency.erl#L255-L262</a><br>
<br>
case CallbackModule of<br>
undefined -><br>
error_logger:warning_msg("Found a double
process for ~s, killing it on local node ~p", [Key,
node()]),<br>
exit(LocalProcessPid, kill);<br>
_ -> spawn(fun() -><br>
error_logger:warning_msg("Found a double
process for ~s, about to trigger callback on local node
~p", [Key, node()]),<br>
CallbackModule:CallbackFunction(Key,
LocalProcessPid) end)<br>
end<br>
<br>
And this makes it look like it is possible for two nodes
to find conflicting pids, and if they find it at the same
time, both processes are killed at once. This can be
worked-around by setting up a function that always picks
the same pid no matter who executes it (exit(max(P1,P2),
kill), for example), but killing the local pid always
risks having all nodes involved making that same decision
and then having nobody left as soon as there's a conflict.<br>
</blockquote>
<div><br>
</div>
<div><br>
</div>
<div>
<div>When a node is disconnected from the cluster, the
other nodes will remove from their mnesia tables all the
pids (and hence the keys) that run on the disconnected
node, and viceversa:</div>
<div><a moz-do-not-send="true"
href="https://github.com/ostinelli/syn/blob/master/src/syn_consistency.erl#L134">https://github.com/ostinelli/syn/blob/master/src/syn_consistency.erl#L134</a><br>
</div>
<div><br>
</div>
<div>This means that the disconnected node *does not* have
in its mnesia replica the keys of all the other nodes,
and the other nodes *do not* have in their mnesia
replicas the keys of the disconnected node.</div>
<div><br>
</div>
<div>If the disconnected node was to merge back in right
away (i.e. with no new registrations happening), there
simply wouldn't be any conflicts and everything would be
merged in.</div>
<div><br>
</div>
<div>In a more realistic scenario, the nodes of the
cluster and the disconnected node keep registering new
pids.</div>
<div>If, during the net split, there's no unique key that
has been used both on the disconnected node and on the
rest of the cluster, then we're back to the previous
scenario: everything gets merged in.</div>
<div>
<div>If the same unique key has been registered both on
the disconnected node and on the cluster, then we have
a conflict.</div>
</div>
<div><br>
</div>
<div>In this case, if you scroll a little above in the
code, you'll see that at that all of the merge code runs
inside of a global lock:</div>
</div>
<div>
<div><a moz-do-not-send="true"
href="https://github.com/ostinelli/syn/blob/master/src/syn_consistency.erl#L180">https://github.com/ostinelli/syn/blob/master/src/syn_consistency.erl#L180</a><br>
</div>
<div><br>
</div>
<div>When one node starts the merge, the other nodes are
basically waiting. The risk of having both killed is
therefore non-existent. Or, I might have forgotten
something (it happens!), in which way I'd be delighted
to know and improve the code :)</div>
</div>
<div><br>
</div>
<div>Just to give you an example of what I've been
observing: 2 nodes, 1 million connected (and registered)
devices, a net split of 5 minutes, less than 10 conflicts,
resolved in less than 500ms from the moments mnesia
signalled an inconsistent database, to the moment the
global lock is released).<br>
</div>
</div>
</div>
</div>
</blockquote>
The handling of conflicts is important when classifying the system.
I have seen in the code that "doubles" are purged which are likely
when the same name exists in two separate network partitions that
are attempting to merge back together. You have used the term
"eventually consistent" to basically mean "consistent until a
netsplit occurs", due to the loss of data when separate network
partitions are merged. Due to using a global lock to resolve any
conflicts that exist during the merge, you are losing availability
during that time period, even if it is only 500ms for 2 nodes with a
decent amount of processes. So, that means your system is partition
tolerant all the time while losing both consistency and availability
when a netsplit occurs.<br>
<br>
I understand this type of system matches your use case, but I think
it is important to be clear about the impact of netsplits.<br>
<br>
Best Regards,<br>
Michael<br>
<blockquote
cite="mid:CAM5fRyrLRJwLC5KkFMj-hzv0uQRhAfsOQ1vqwk0FW3ZSE6JNeg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">So
what could be the impact of this on a cluster where the
conflict rate is higher, say 80%? Would an app like Syn
mostly kill my entire cluster if I don't configure it
properly? Or maybe I misunderstood something from my very
brief reading of the code.<br>
</blockquote>
<div><br>
</div>
<div>Please consider that as per the use-case defined (IoT
applications), conflicts are extremely minor.</div>
<div>Your example would mean that 80% of the devices, during
a net split, connected both to the disconnected node and
the rest of the cluster. It is weird to say the least.<br>
</div>
<div><br>
</div>
<div>That being said, I have not benchmarked this case
scenario, but here again we are talking about finding the
conflicting keys, and sending an exit signal to 1 of the 2
conflicting pids:</div>
<div><a moz-do-not-send="true"
href="https://github.com/ostinelli/syn/blob/master/src/syn_consistency.erl#L238">https://github.com/ostinelli/syn/blob/master/src/syn_consistency.erl#L238</a><br>
</div>
<div><br>
</div>
<div>These things are rather quick in the 7 digit numbers.</div>
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">The
speed boost is interesting, but without more details about
the app's handling of conflict when the uniqueness of
names isn't guaranteed, it's hard to make myself a solid
idea of how it would go in the wild.<br>
</blockquote>
<div><br>
</div>
<div>If you mean uniqueness of names in a precise given
time, indeed. Syn is eventually consistent. </div>
<div><br>
</div>
<div><br>
</div>
<div>Best,</div>
<div>r.</div>
<div><br>
</div>
<div> </div>
</div>
<br>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
erlang-questions mailing list
<a class="moz-txt-link-abbreviated" href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a>
<a class="moz-txt-link-freetext" href="http://erlang.org/mailman/listinfo/erlang-questions">http://erlang.org/mailman/listinfo/erlang-questions</a>
</pre>
</blockquote>
<br>
</body>
</html>