[erlang-questions] massive distribution

Wed Dec 2 17:54:55 CET 2009

I'm entering this thread a bit late.

One thing I have noticed when running on certain processors is that the TCP connection becomes a bottle neck if there is a lot of traffic. There can even be issues with low load caused by things like no_delay set or not set (that can be resolved with modifying the "dist" TCP kernel options - for example {dist_nodelay, true}).

But with the load issue: One thing that would be nice is having the ability to specify a "pool" of TCP connections between VMs, rather than one. Especially with lots of processes where we could be hitting locks in the kernel. If this is possible today, please let me know how to do that :-)

-----Original Message-----
From: erlang-questions@REDACTED [mailto:erlang-questions@REDACTED] On Behalf Of Garrett Smith
Sent: Wednesday, December 02, 2009 10:57 AM
To: Peter Sabaini
Cc: Erlang Users' List
Subject: Re: [erlang-questions] massive distribution

On Wed, Dec 2, 2009 at 4:07 AM, Peter Sabaini <peter@REDACTED> wrote:
> On Tue, 2009-12-01 at 16:34 -0600, Garrett Smith wrote:
>> On Tue, Dec 1, 2009 at 3:44 PM, Peter Sabaini <peter@REDACTED> wrote:
>> > On Tue, 2009-12-01 at 10:27 -0600, Garrett Smith wrote:
>> >> What is happening that makes something an unstable connection?
>> >
>> > The behaviour was that nodes seemed to randomly produced error messages,
>> > eg.:
>> >
>> > =ERROR REPORT==== 9-Jul-2009::13:56:07 ===
>> > The global_name_server locker process received an unexpected message:
>> > {{#Ref<0.0.0.1957>,'xy@REDACTED'},false}
>> >
>> > Or
>> >
>> > =ERROR REPORT==== 9-Jul-2009::14:03:33 ===
>> > global: 'foo@REDACTED' failed to connect to 'qux@REDACTED'
>>
>> Hmm...not to say the node count isn't part of the problem, but there
>> are *lots* of reasons this could happen, none of which have anything
>> to do with Erlang.
>
> My evidence is that these problems appeared (with little load) when I
> increased the nodecount, and disappeared (even under heavy load) when
> going beyond a threshold (seemed stable with 64 nodes in my case). I
> didn't investigate this further though as I was more interested in the
> behaviour of my application.

This is good input and a bit of a stab (well, poke) in heart of
Erlang's "distributed" story. 100+ nodes may have historically been a
large cluster, but that's quickly changing. It's not unlike the set of
problems that get kicked up by the new large multicore systems (e.g.
process affinity threads that keep popping up here).

Taking the global process registry alone -- and this seems to a
cornerstone of distributed Erlang -- you'd need a robust peer-to-peer
replication strategy that survives constant linear growth of the
network. Or maybe this is nonsense and one must conceded that Erlang's
out-of-the-box location transparency stops at around 100 nodes over
TCP/IP.

Garrett

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org