connecting nodes

Mon Apr 9 09:32:08 CEST 2001

"Vlad Dumitrescu" <vladdu@REDACTED> wrote:
> 
> ***
> As far as I can tell, there isn't a way to automatically connect
> nodes. In order to (for example) access a global server and allow
> for it's node to crash and come back online transparently, one has
> to know that node's name and it's host name.

There are two sides of this problem.  

First, there is inital connection.  For this, you'll have to provide
the node name to the system in some way.  Once you have done this,
global can be used to automatically set up a fully connected net.
There is no auto-discovery mechanism in the standard distribution, but
it is quite simple to write your own, either using broadcast or
multicast.  [We're using a broadcast mechanism for some nodes in our
systems.]  You might have to think about security issues though.  It
all depends on the application.

Second, once all nodes have contact, you'd like to make sure that all
nodes keep their connections.  If one node crashes and restart, you're
back to initial start, which can be handled.  A worse situation is if
you have at least three nodes, and because of network/host loads, one
of the TCP connections times out.  In this case, you don't have a
fully connected net anymore, and global stops working(*).  Probably
Mnesia as well.  This is a big defect in global(**).  [In our system,
each node runs a 'pinger' process, which starts to periodically ping
each node as it goes down, until it either comes back up, or is
removed from the system.  Once it's up again, you might end up in with
a partitioned network which regained it's contact, which is another
difficult problem to solve.  We solve it by restarting one of the
partitions, and some db magic :) ]

[*] Unfortunately, it doesn't even detect this situation, so the
result might be that the name registry becomes inconsistent, or that
global:sync() hangs (which means that the global handshake procedure
hangs or failed), or it crashes (which is the best of the three).

[**] Since I designed one incaration of global, you can blame me ;)

> I don't really like the idea of hardcoding the node/host names...

You should never have to do that of course.  [In our system, each node
is added by an operator (he doesn't know he's adding an Erlang node of
course), which provides the IP address of another box in the system.
We contact the node on that box, and store the new node name in the
configuration files in the rest of the system.  Auto-discovery could
be used instead, and we probably will do that for some special systems
in the future.]

/martin