connecting nodes
Ulf Wiger
etxuwig@REDACTED
Mon Apr 9 13:20:17 CEST 2001
On Mon, 9 Apr 2001, Martin Bjorklund wrote:
>A worse situation is if
>you have at least three nodes, and because of network/host loads, one
>of the TCP connections times out. In this case, you don't have a
>fully connected net anymore, and global stops working(*). Probably
>Mnesia as well. This is a big defect in global(**). [In our system,
>each node runs a 'pinger' process, which starts to periodically ping
>each node as it goes down, until it either comes back up, or is
>removed from the system. Once it's up again, you might end up in with
>a partitioned network which regained it's contact, which is another
>difficult problem to solve. We solve it by restarting one of the
>partitions, and some db magic :) ]
In our system, the AXD 301, we do something similar, but also
enable the flag 'kernel -dist_auto_connect once', in order to
handle partitioned networks in a controlled manner. This flag
makes sure that two nodes can't reconnect, once separated, without
at least one of the nodes restarting. In addition to this, we have
a "backdoor ping" (UDP-based) to detect communication failures:
if we get a ping from a known node that's not in the node list,
we have a partitioned network.
One way to handle the auto-connect problem could be to let mnesia
connect. If your system is set up so that you have a few mnesia
nodes that handle the persistent database, and other nodes that
just run diskless mnesia clients, you can start the diskless
clients with -mnesia extra_db_nodes <persistent nodes>'. Then,
the diskless clients will attempt to find at least one of the
persistent nodes in order to retrieve the mnesia schema.
/Uffe
--
Ulf Wiger tfn: +46 8 719 81 95
Senior System Architect mob: +46 70 519 81 95
Strategic Product & System Management ATM Multiservice Networks
Data Backbone & Optical Services Division Ericsson Telecom AB
More information about the erlang-questions
mailing list