Network partition and OTP
Reto Kramer
kramer@REDACTED
Tue Apr 29 22:14:22 CEST 2003
I'm looking for information on how OTP behaves when the network between
nodes fails, and reconnects (nodes stay up all the time).
** Question 1 **
In particular the behavior of "global", the "distributed application
controller" and Ulf's "locker" (contrib page) is what I'd like to
understand better in network partition/reconnect scenarios.
I've found references to work of Thomas Arts et al [1,2] and Ulf Wiger
[3] and snippets here and there, but it would be most helpful to me if
an OTP wizard could illuminate this topic comprehensively.
For "global" one has to expect "name conflict" errors when the network
comes back together. By extension I guess the same applies to the
application controller (via it's use of global). Not sure about Ulf's
locker. Using Ulf's release handling tutorial example, I can generate
a naming conflict and observe what happens (start n1 then n2 (owner),
suspend erl process that runs n2, dist fails over to n1, then resume
erl that runs n2, ping n1 -> naming conflict, kills dist_server on n2,
supervisor restarts n2 which takes over from n1 - takeover handshake
not logged - does it happen?).
=INFO REPORT==== 29-Apr-2003::12:59:39 ===
global: Name conflict terminating {dist_server,<1930.59.0>}
** Question 2 ** is there any risk of loosing messages that were
buffered by the dist_server instance just before it got killed? I'm
worried that while the global:register etc call are atomic across nodes
[docs and 2], a potential client (client of dist_server I mean here) is
not part of the atomic conflict resolution/re-registering process.
I noticed the "relay" function in Ulf's release handling tutorial [3],
but am not sure it kicks in when global detects the naming conflict
upon reconnect - I guess not, correct?
** Question 3 ** - somewhat related to the above:
Is there any library support for "majority voting" and/or "lease
management" in OTP that I've not discovered yet? In particular I'm
interested in rejecting a global:register/2 if the process calling the
function is not in a node majority-set.
Thanks,
- Reto
References:
Thomas Arts et al [1,2], Ulf Wiger [3]
[1] http://www.ericsson.com/cslab/~thomas/publ2.shtml (resource locker
case study)
[2]
http://www.erlang.org/ml-archive/erlang-questions/200107/msg00031.html
(christian paper)
[3] (OTP release handling tutorial by Ulf) - was on the newsgroup,
cannot find ref right now
______________________
There are two ways of constructing a software design. One way is to
make it so simple that there are obviously no deficiencies. And the
other way is to make it so complicated that there are no obvious
deficiencies.
C.A.R. Hoare
1980 Turing Award Lecture
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 2741 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20030429/250c8b6d/attachment.bin>
More information about the erlang-questions
mailing list