[erlang-questions] massive distribution

Garrett Smith g@REDACTED
Tue Dec 1 17:27:17 CET 2009


On Tue, Dec 1, 2009 at 8:22 AM, Peter Sabaini <peter@REDACTED> wrote:
> On Tue, 2009-12-01 at 09:08 -0500, Kevin A. Smith wrote:
>> Fully connected meshes suck for large numbers of nodes. Erlang provides a number of
>> knobs to control how a cluster is stitched together such as "-connect_all false"
>> and "-hidden".
>
> Which would entail keeping track of connected nodes and connection
> establishment/teardown, correct?
>
>> Also, tuning the net tick time (see man 3 net_kernel and man 6 kernel) can be helpful
>> in keeping a large cluster running.
>
> I fiddled around with those a bit. I don't have the exact values at
> hand, but I set net_ticktime to rather large values, something like
> 300s, without substantial improvements in the number of nodes able to
> keep a stable connection.

What is happening that makes something an unstable connection?

I have a mesh of several dozen nodes and the connections can drop at
any time given the basic unreliability of network connections. Each
node, however, is responsible for trying to reestablish a connection
to a well known 'hub', which tends to keep the mesh in tact even when
some nodes fall off occasionally. (This is a single point of failure,
but the 'hub' could easily be a list, like DNS.)

I've found that setting -connect_all false disables the global process
registry, which makes the setting practically useless. I'm guess I've
missed something here. What is the approach to keeping the global
registry in sync when -connect_all false is set?

I've also read about, but not explored, a pattern of segmenting a mesh
into smaller groups of nodes. From what I understand -- that each node
tries to connect to each node -- a mesh has m(n-1)/2 connections, so
80 nodes would imply 3000+ connections. For most applications, that's
a lot of unneeded overhead -- not ever node is going to need to talk
to every other node.

When networks are small, Erlang's global process registration and
lookup facility is phenomenal. But the out-of-the-box scheme
definitely presents challenges in large networks.

I'm definitely curious to know how others have dealt with this type of problem.

Garrett


More information about the erlang-questions mailing list