[erlang-questions] large Erlang clusters

Matt Williamson <>
Sun Aug 17 18:37:59 CEST 2008


Yes, I regret that I haven't got that sort of experience to share with you.
I'd *really *like to hear of some as well, as I'm working on an open project
that is using distributed Erlang for all it's inter-node communication.

If you want to play it safe, I hear that using a DHT, something like chord <
http://en.wikipedia.org/wiki/Chord_(DHT)>, is a proven scalable method. To
make a ring, you could still use distributed Erlang if you set
connect_allto false and use ets to keep track of neighbors.

On Sun, Aug 17, 2008 at 8:22 AM, Serge Aleynikov <> wrote:

> Thanks Matt, I am indeed familiar with that net_kernel's setting.
>
> My current reasoning for large clusters is as follows.  If all inter-node
> communications are quite busy, this net_kernel's pinging doesn't add any
> extra overhead for them.  However if several busy nodes are also connected
> to a large number of quiet nodes, they'll have to coop with extra background
> pinging load.
>
> Perhaps that load is not to worry about when the "close to" real-time
> response is expected of the "busy" nodes, as in case of an I/O busy node
> signaling select/poll calls will likely happen simultaneously on several
> file descriptors, with some of them carrying pinging payload.  As a result
> the total number of system calls involved in processing these events
> wouldn't be as many as if the events were spatially separated. However,
> since all net_ticktime logic happens in the bytecode, in a busy node it
> steals CPU cycles from other time-sensitive code, leading to increased
> latencies.  This may also be a constraint if other OS processes on the same
> host have higher priority then the emulator and the emulator is not allowed
> to run with SMP mode on so that it doesn't compete with CPU resources of
> other more critical processes.
>
> This background load can be reduced by either bumping up the net_ticktime
> setting (this is quite tricky if part of nodes are located in one subnet and
> others are in a more distant network - unfortunately with this feature *all*
> nodes in the cluster must have the same setting) or making sure that
> global/net_kernel don't maintain a full mesh of interconnected nodes using
> global_groups. In the past I used the later approach (together with
> dist_auto_connect net_kernel's setting) to make Erlang applications work
> through firewalls.  In case of node partitioning applications that take
> advantage of failover and global registration must be careful not to deal
> with several global groups as global names may be moved between them during
> failover.
>
> I was hoping to hear in this thread experiences of those currently running
> large Erlang clusters in production.  In my prior Erlang deployments I
> haven't had cases of more than 30-40 node clusters (where total number of OS
> processes were much larger than that using ei/tcp to connect to Erlang, but
> the Erlang applications would not spread over more than 40 hosts).  Since
> now there is a case for running a 400 node cluster (an instance of a VM per
> host) in a much larger local network (with a growth potential to about
> 600-700 nodes), I want to make sure that I won't miss something obvious to
> those who had similar experiences.
>
> Regards,
>
> Serge
>
>
> Matt Williamson wrote:
>
>> If you check out net_ticktime in the kernel_app docs, (you can set it with
>> net_kernel:set_net_ticktime/1,2), you'll see:
>>
>> "Once every TickTime/4 second, all connected nodes are ticked (if anything
>> else has been written to a node) and if nothing has been received from
>> another node within the last four (4) tick times that node is considered
>> to
>> be down..."
>>
>> The default ticktime is 60s, meaning a ping every 15 seconds.
>>
>> On Sat, Aug 16, 2008 at 5:24 PM, Serge Aleynikov <>
>> wrote:
>>
>>  I suppose that the problem with the max number of sockets is solved by
>>> tweaking session limits (ulimit) and using kernel poll (+K true).
>>>
>>> As I understand, in a 600 node cluster every node will maintain
>>> connections to the rest 599 nodes, and send periodic pings.  So, that
>>> pinging overhead would be something in the order of 10 events per second
>>>  per node in this configuration.  While the number doesn't seem
>>> intimidating I wonder if that overhead becomes noticeable in large
>>> network configurations and if there are any other guidelines that help
>>> architect such large network clusters to keep background load minimal.
>>>
>>> Serge
>>>
>>> Viktor Sovietov wrote:
>>>
>>>> Hi Serge
>>>>
>>>> As far as I know you're only limited with the maximum number of sockets
>>>> which are available on your system and with number of atoms which can be
>>>> used as node names.
>>>> We tested 600 nodes cluster, but I honestly can't recall if there were
>>>>
>>> any
>>>
>>>> patches to BEAM to increase mentioned parameters.
>>>>
>>>> Sincerely,
>>>>
>>>> --Viktor
>>>>
>>>>
>>>> Serge Aleynikov-2 wrote:
>>>>
>>>>> Does any one have experience running somewhere between 200 and 400
>>>>> nodes
>>>>> in production?  I recall that Erlang distributed layer had a limit of
>>>>> 256 nodes.  Is it still the case?
>>>>>
>>>>> I suppose that partitioning the cluster in several global_groups should
>>>>> limit the network load and the number of open file descriptors on each
>>>>> node would be reduced.
>>>>>
>>>>> Are there any other concerns one should be aware of when working with
>>>>> such large clusters.
>>>>>
>>>>> Serge
>>>>> _______________________________________________
>>>>> erlang-questions mailing list
>>>>> 
>>>>> http://www.erlang.org/mailman/listinfo/erlang-questions
>>>>>
>>>>>
>>>>>  _______________________________________________
>>> erlang-questions mailing list
>>> 
>>> http://www.erlang.org/mailman/listinfo/erlang-questions
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080817/cf86c38a/attachment.html>


More information about the erlang-questions mailing list