[erlang-questions] large Erlang clusters
Sun Aug 17 14:22:00 CEST 2008
Thanks Matt, I am indeed familiar with that net_kernel's setting.
My current reasoning for large clusters is as follows. If all
inter-node communications are quite busy, this net_kernel's pinging
doesn't add any extra overhead for them. However if several busy nodes
are also connected to a large number of quiet nodes, they'll have to
coop with extra background pinging load.
Perhaps that load is not to worry about when the "close to" real-time
response is expected of the "busy" nodes, as in case of an I/O busy node
signaling select/poll calls will likely happen simultaneously on several
file descriptors, with some of them carrying pinging payload. As a
result the total number of system calls involved in processing these
events wouldn't be as many as if the events were spatially separated.
However, since all net_ticktime logic happens in the bytecode, in a busy
node it steals CPU cycles from other time-sensitive code, leading to
increased latencies. This may also be a constraint if other OS
processes on the same host have higher priority then the emulator and
the emulator is not allowed to run with SMP mode on so that it doesn't
compete with CPU resources of other more critical processes.
This background load can be reduced by either bumping up the
net_ticktime setting (this is quite tricky if part of nodes are located
in one subnet and others are in a more distant network - unfortunately
with this feature *all* nodes in the cluster must have the same setting)
or making sure that global/net_kernel don't maintain a full mesh of
interconnected nodes using global_groups. In the past I used the later
approach (together with dist_auto_connect net_kernel's setting) to make
Erlang applications work through firewalls. In case of node
partitioning applications that take advantage of failover and global
registration must be careful not to deal with several global groups as
global names may be moved between them during failover.
I was hoping to hear in this thread experiences of those currently
running large Erlang clusters in production. In my prior Erlang
deployments I haven't had cases of more than 30-40 node clusters (where
total number of OS processes were much larger than that using ei/tcp to
connect to Erlang, but the Erlang applications would not spread over
more than 40 hosts). Since now there is a case for running a 400 node
cluster (an instance of a VM per host) in a much larger local network
(with a growth potential to about 600-700 nodes), I want to make sure
that I won't miss something obvious to those who had similar experiences.
Matt Williamson wrote:
> If you check out net_ticktime in the kernel_app docs, (you can set it with
> net_kernel:set_net_ticktime/1,2), you'll see:
> "Once every TickTime/4 second, all connected nodes are ticked (if anything
> else has been written to a node) and if nothing has been received from
> another node within the last four (4) tick times that node is considered to
> be down..."
> The default ticktime is 60s, meaning a ping every 15 seconds.
> On Sat, Aug 16, 2008 at 5:24 PM, Serge Aleynikov <> wrote:
>> I suppose that the problem with the max number of sockets is solved by
>> tweaking session limits (ulimit) and using kernel poll (+K true).
>> As I understand, in a 600 node cluster every node will maintain
>> connections to the rest 599 nodes, and send periodic pings. So, that
>> pinging overhead would be something in the order of 10 events per second
>> per node in this configuration. While the number doesn't seem
>> intimidating I wonder if that overhead becomes noticeable in large
>> network configurations and if there are any other guidelines that help
>> architect such large network clusters to keep background load minimal.
>> Viktor Sovietov wrote:
>>> Hi Serge
>>> As far as I know you're only limited with the maximum number of sockets
>>> which are available on your system and with number of atoms which can be
>>> used as node names.
>>> We tested 600 nodes cluster, but I honestly can't recall if there were
>>> patches to BEAM to increase mentioned parameters.
>>> Serge Aleynikov-2 wrote:
>>>> Does any one have experience running somewhere between 200 and 400 nodes
>>>> in production? I recall that Erlang distributed layer had a limit of
>>>> 256 nodes. Is it still the case?
>>>> I suppose that partitioning the cluster in several global_groups should
>>>> limit the network load and the number of open file descriptors on each
>>>> node would be reduced.
>>>> Are there any other concerns one should be aware of when working with
>>>> such large clusters.
>>>> erlang-questions mailing list
>> erlang-questions mailing list
More information about the erlang-questions