gen_server clustering strategy

Tue Feb 1 18:25:00 CET 2011

Hello list,

We are interested in understanding how the Erlang distribution
mechanism behaves in a fully connected cluster. Our concern is that as
we add nodes, the overhead will become a problem.

Our application is a bunch of gen_servers containing state, which we
need to run on multiple nodes due to memory usage. A request will come
to our application and it needs to be handled by the correct
gen_server. Every request includes a user id which we can use to map
to the process.

We have two specific use cases in mind.

1) Proxy to workers

In this scenario we have a proxy (or multiple proxies) accepting
requests at the edge of our cluster. The proxy will ask the correct
server for a response. Either through using the 'global' module or
gproc or something similar, the proxy node will keep track of the
mapping of user ids to process and nodes. The proxy will call the node
using the Erlang distribution protocol.

2) Mesh

In this scenario a request can be handled by any node in the cluster.
If the request cannot be handled by the local node, the correct node
is called on instead. Every node in the cluster needs to keep track of
which id belongs to which process.

The numbers:
 * We must be able to run 100,000 processes, each may peak at 300kb of memory
 * We expect 5000 requests coming in to our application per second at
peak, generating the same number of messages
 * Out of those 5000 messages, 1000 has a content that may peak at 300kb
 * All the other messages are well under 1kb

Our concerns:

 * In the mesh, will the traffic between the nodes become a problem?
Lets say we have 4 nodes, if the requests are evenly divided between
the nodes the probability of hitting the correct node is 25%. With 100
nodes this is 1%. As we add nodes, there will be more chatter. May
this chatter "drown-out" the important messages, like pid down, etc.

 * Will it be more efficient to use the proxy to message every node?
Will the message then always go to the node directly, or may it go
through some other node in the cluster?

 * For the mesh, we need to keep the 'global' or gproc state
replicated. With 'global' we will run into the default limit of atoms.
If we were to increase this limit, what kind of memory usage could we
expect? Are there any other nasty side-effects that we should be aware
of?

 * In the "Erlang/OTP in Action" book it is stated that you may have a
"couple of dozen, but probably not hundreds of nodes." Our
understanding is that this is because of traffic like "this pid just
died", "this node is no longer available", etc.

 * If we were to eliminate messaging between nodes, how many nodes
could we actually run in a fully connected cluster? Will a
high-latency network like ec2 affect this number or just the latency
of messages? What else may affect the size?

Regards
Knut