<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2800.1141" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT face="Courier New" size=2>Global has a "deconflict" hook for globally
registered names.</FONT></DIV>
<DIV><FONT face="Courier New" size=2>The default behaviour is to pick one of the
conflicting processes</FONT></DIV>
<DIV><FONT face="Courier New" size=2>and kill it; another standard option is to
simply unregister the</FONT></DIV>
<DIV><FONT face="Courier New" size=2>name.</FONT></DIV>
<DIV><FONT face="Courier New" size=2></FONT> </DIV>
<DIV><FONT face="Courier New" size=2>The locker (my contrib) performs a lock
merge and forcefully </FONT></DIV>
<DIV><FONT face="Courier New" size=2>releases locks that cannot be automatically
merged (e.g. two</FONT></DIV>
<DIV><FONT face="Courier New" size=2>different processes have an exclusive lock
on the same resource.</FONT></DIV>
<DIV><FONT face="Courier New" size=2></FONT> </DIV>
<DIV><FONT face="Courier New" size=2>Mnesia will detect partitioned networks,
but will not do anything</FONT></DIV>
<DIV><FONT face="Courier New" size=2>to resolve the situation automatically
(there is no automatic</FONT></DIV>
<DIV><FONT face="Courier New" size=2>solution that works in all
cases.)</FONT></DIV>
<DIV><FONT face="Courier New" size=2></FONT> </DIV>
<DIV><FONT face="Courier New" size=2>Dist_ac will not try to resolve partitioned
networks, so if you</FONT></DIV>
<DIV><FONT face="Courier New" size=2>use dist_ac, you have to restart all nodes
except one.</FONT></DIV>
<DIV><FONT face="Courier New" size=2></FONT> </DIV>
<DIV><FONT face="Courier New" size=2>The problem of partitionet networks is
really difficult, and </FONT></DIV>
<DIV><FONT face="Courier New" size=2>as Asko explains, we address it in AXD301
with the </FONT></DIV>
<DIV><FONT face="Courier New" size=2>'dist_auto_connect once' option. This also
gives some release</FONT></DIV>
<DIV><FONT face="Courier New" size=2>from the general problem that a system that
automatically</FONT></DIV>
<DIV><FONT face="Courier New" size=2>reconnects from a partitionet net is quite
unstable during the</FONT></DIV>
<DIV><FONT face="Courier New" size=2>merge process.</FONT></DIV>
<DIV><FONT face="Courier New" size=2></FONT> </DIV>
<DIV><FONT face="Courier New" size=2>¨/Uffe</FONT></DIV>
<DIV><FONT face="Courier New" size=2></FONT> </DIV>
<BLOCKQUOTE
style="PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
<DIV style="FONT: 10pt arial">----- Original Message ----- </DIV>
<DIV
style="BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: black"><B>From:</B>
<A title=kramer@acm.org href="mailto:kramer@acm.org">Reto Kramer</A> </DIV>
<DIV style="FONT: 10pt arial"><B>To:</B> <A title=erlang-questions@erlang.org
href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</A>
</DIV>
<DIV style="FONT: 10pt arial"><B>Cc:</B> <A title=etxuwig@cbe.ericsson.se
href="mailto:etxuwig@cbe.ericsson.se">etxuwig@cbe.ericsson.se</A> </DIV>
<DIV style="FONT: 10pt arial"><B>Sent:</B> den 29 april 2003 22:14</DIV>
<DIV style="FONT: 10pt arial"><B>Subject:</B> Network partition and OTP</DIV>
<DIV><BR></DIV>I'm looking for information on how OTP behaves when the network
between nodes fails, and reconnects (nodes stay up all the time).<BR><BR>**
Question 1 **<BR>In particular the behavior of "global", the "distributed
application controller" and Ulf's "locker" (contrib page) is what I'd like to
understand better in network partition/reconnect scenarios.<BR><BR>I've found
references to work of Thomas Arts et al [1,2] and Ulf Wiger [3] and snippets
here and there, but it would be most helpful to me if an OTP wizard could
illuminate this topic comprehensively.<BR><BR>For "global" one has to expect
"name conflict" errors when the network comes back together. By extension I
guess the same applies to the application controller (via it's use of global).
Not sure about Ulf's locker. Using Ulf's release handling tutorial example, I
can generate a naming conflict and observe what happens (start n1 then n2
(owner), suspend erl process that runs n2, dist fails over to n1, then resume
erl that runs n2, ping n1 -> naming conflict, kills dist_server on n2,
supervisor restarts n2 which takes over from n1 - takeover handshake not
logged - does it happen?).<BR><BR>=INFO REPORT==== 29-Apr-2003::12:59:39
===<BR>global: Name conflict terminating
{dist_server,<1930.59.0>}<BR><BR>** Question 2 ** is there any risk of
loosing messages that were buffered by the dist_server instance just before it
got killed? I'm worried that while the global:register etc call are atomic
across nodes [docs and 2], a potential client (client of dist_server I mean
here) is not part of the atomic conflict resolution/re-registering
process.<BR><BR>I noticed the "relay" function in Ulf's release handling
tutorial [3], but am not sure it kicks in when global detects the naming
conflict upon reconnect - I guess not, correct?<BR><BR>** Question 3 ** -
somewhat related to the above:<BR>Is there any library support for "majority
voting" and/or "lease management" in OTP that I've not discovered yet? In
particular I'm interested in rejecting a global:register/2 if the process
calling the function is not in a node majority-set.<BR><BR>Thanks,<BR>-
Reto<BR><BR>References:<BR><BR>Thomas Arts et al [1,2], Ulf Wiger
[3]<BR><BR>[1] http://www.ericsson.com/cslab/~thomas/publ2.shtml (resource
locker case study)<BR>[2]
http://www.erlang.org/ml-archive/erlang-questions/200107/msg00031.html
(christian paper)<BR>[3] (OTP release handling tutorial by Ulf) - was on the
newsgroup, cannot find ref right now<BR><BR>______________________<BR>There
are two ways of constructing a software design. One way is to make it so
simple that there are <B>obviously</B> no deficiencies. And the other way is
to make it so complicated that there are no <B>obvious</B>
deficiencies.<BR><BR><I>C.A.R. Hoare<BR>1980 Turing Award
Lecture</I></BLOCKQUOTE></BODY></HTML>