[erlang-questions] massive distribution

Wed Dec 2 16:56:34 CET 2009

On Wed, Dec 2, 2009 at 4:07 AM, Peter Sabaini <peter@REDACTED> wrote:
> On Tue, 2009-12-01 at 16:34 -0600, Garrett Smith wrote:
>> On Tue, Dec 1, 2009 at 3:44 PM, Peter Sabaini <peter@REDACTED> wrote:
>> > On Tue, 2009-12-01 at 10:27 -0600, Garrett Smith wrote:
>> >> What is happening that makes something an unstable connection?
>> >
>> > The behaviour was that nodes seemed to randomly produced error messages,
>> > eg.:
>> >
>> > =ERROR REPORT==== 9-Jul-2009::13:56:07 ===
>> > The global_name_server locker process received an unexpected message:
>> > {{#Ref<0.0.0.1957>,'xy@REDACTED'},false}
>> >
>> > Or
>> >
>> > =ERROR REPORT==== 9-Jul-2009::14:03:33 ===
>> > global: 'foo@REDACTED' failed to connect to 'qux@REDACTED'
>>
>> Hmm...not to say the node count isn't part of the problem, but there
>> are *lots* of reasons this could happen, none of which have anything
>> to do with Erlang.
>
> My evidence is that these problems appeared (with little load) when I
> increased the nodecount, and disappeared (even under heavy load) when
> going beyond a threshold (seemed stable with 64 nodes in my case). I
> didn't investigate this further though as I was more interested in the
> behaviour of my application.

This is good input and a bit of a stab (well, poke) in heart of
Erlang's "distributed" story. 100+ nodes may have historically been a
large cluster, but that's quickly changing. It's not unlike the set of
problems that get kicked up by the new large multicore systems (e.g.
process affinity threads that keep popping up here).

Taking the global process registry alone -- and this seems to a
cornerstone of distributed Erlang -- you'd need a robust peer-to-peer
replication strategy that survives constant linear growth of the
network. Or maybe this is nonsense and one must conceded that Erlang's
out-of-the-box location transparency stops at around 100 nodes over
TCP/IP.

Garrett