connecting nodes

Ulf Wiger etxuwig@REDACTED
Mon Apr 9 13:20:17 CEST 2001

On Mon, 9 Apr 2001, Martin Bjorklund wrote:

>A worse situation is if
>you have at least three nodes, and because of network/host loads, one
>of the TCP connections times out.  In this case, you don't have a
>fully connected net anymore, and global stops working(*).  Probably
>Mnesia as well.  This is a big defect in global(**).  [In our system,
>each node runs a 'pinger' process, which starts to periodically ping
>each node as it goes down, until it either comes back up, or is
>removed from the system.  Once it's up again, you might end up in with
>a partitioned network which regained it's contact, which is another
>difficult problem to solve.  We solve it by restarting one of the
>partitions, and some db magic :) ]

In our system, the AXD 301, we do something similar, but also 
enable the flag 'kernel -dist_auto_connect once', in order to 
handle partitioned networks in a controlled manner. This flag 
makes sure that two nodes can't reconnect, once separated, without
at least one of the nodes restarting. In addition to this, we have
a "backdoor ping" (UDP-based) to detect communication failures:
if we get a ping from a known node that's not in the node list,
we have a partitioned network.

One way to handle the auto-connect problem could be to let mnesia
connect. If your system is set up so that you have a few mnesia 
nodes that handle the persistent database, and other nodes that 
just run diskless mnesia clients, you can start the diskless
clients with -mnesia extra_db_nodes <persistent nodes>'. Then,
the diskless clients will attempt to find at least one of the 
persistent nodes in order to retrieve the mnesia schema.

Ulf Wiger                                    tfn: +46  8 719 81 95
Senior System Architect                      mob: +46 70 519 81 95
Strategic Product & System Management    ATM Multiservice Networks
Data Backbone & Optical Services Division      Ericsson Telecom AB

More information about the erlang-questions mailing list