[erlang-questions] gen_server clustering strategy

Evans, Matthew <>
Wed Feb 2 17:40:25 CET 2011


My worry is the OP is suggesting using the global name server. This has a few problems

1) That he will soon run out of the atom table space (I think he's making the assumption that gen_servers need to be registered, which is not the case). 
2) Sharing process state information between all nodes
3) All all lookups need to go to a central gen_server (the global name-server) running on a single logical core.

(I might be wrong about point 3).

But actually a hashing function could be a good idea, at least to find the correct node. This could negate the need to share process availability information between host/node boundaries.

One could maintain a list of current nodes in the mesh, and then use a hash to find the correct node. Identifier to pid still needs to be mapped, but this can be done by a local ets table (optimized and fast for lookups).

For example:

process_inbound_request(Message,Identifier) ->
    Nodes = nodes()++[node()],
    Id = erlang:phash2(Identifier, length(Nodes)),
    case lists:nth(Id+1,Nodes) of
        node() -> 
             % On same host, lookup locally to find the correct instance
             dispatch_locally(Message.Identifier);
         Node ->
             dispatch_remotely(Message,Identifier)
    end.

dispatch_locally(Message,Identifier) ->
    case ets:lookup(local_pids, Identifier) ->
        [{_,Pid}] ->
            gen_server:cast(Pid,{inbound,Message});
       _ ->
            ok;
     end.

dispatch_remotely(Message,Identifier) ->
    gen_server:cast({local_proxy,Node},{request,Message,Identifier}).

local_proxy is a locally registered gen_server on each node that implements the equivalent of the dispatch_locally function.

Matt

________________________________________
From: James Churchman []
Sent: Wednesday, February 02, 2011 10:37 AM
To: Evans, Matthew
Cc: Knut Nesheim; 
Subject: Re: [erlang-questions] gen_server clustering strategy

all great questions, most of which i don't know the answers to, but queuing systems can help if a bit of extra latency is ok, and more so if the order does not have to be exact

i think at very large scale hashing becomes the only answer, and handling failed nodes becomes tricky, but a basic distributed mnesia table that contains the number of alive nodes and maybe reports on their repeated failures, and then hashes the input (on userid in your case) to the correct node would get you half way there.

Also i don't think that 100,000 processes that peak at 300kb is a problem for erlang. Thats only 30GB max. If its stored as binaries, and possibly after a really large message you invoke the garbage collector for that process ( i have no idea if this is a bad idea or not, but can be done easily and should reduce memory consumption) then a single box should be able to handle your needs easily, and 3 no probs at all. My mac pro should be ok :-)

i think dedicated hardware is always better than virtualised

As for 1000 messages at 300kb, that seems like quite a lot, bordering on what standard gigabit ethernet can handle,  but just benchmark erlang, again one box might be enough... you can get a 32 core amd server for not much these days. Are these requirements realistic tho, or just "if 10 million people use my service on day one that i have put together on a shoe string etc... wishful thinking"


On 1 Feb 2011, at 19:10, Evans, Matthew wrote:

> Without knowing much more another model would be to use your own ets/mnesia table to map the workers.
>
> I am assuming the creation of a worker is relatively infrequent (compared to the times you need to find a worker).
>
> When you create a worker use gen_server:start(?MODULE, [], []).
>
> This will return {ok,Pid}. You can save the Pid in an mnesia table along with whatever reference you need (doesn't need to be an atom). All nodes in the mesh will get the pid and the reference. Starting it as shown above (with an arity of 3)  means you don't need to use the global service and ensure the names are unique atoms for each server. You can even do start via some application that will spread the gen_servers over different nodes.
>
> When a request comes in it only need do a lookup to find the correct pid.
>
> I can't recall as to how bad the process crash messaging is. But what you could do is to monitor a gen_server locally, and monitor each node globally. When a node fails all other nodes can cleanup processes that were registered on that node.
>
> Matt
>
> ________________________________________
> From:  [] On Behalf Of Knut Nesheim []
> Sent: Tuesday, February 01, 2011 12:25 PM
> To: 
> Subject: [erlang-questions] gen_server clustering strategy
>
> Hello list,
>
> We are interested in understanding how the Erlang distribution
> mechanism behaves in a fully connected cluster. Our concern is that as
> we add nodes, the overhead will become a problem.
>
> Our application is a bunch of gen_servers containing state, which we
> need to run on multiple nodes due to memory usage. A request will come
> to our application and it needs to be handled by the correct
> gen_server. Every request includes a user id which we can use to map
> to the process.
>
> We have two specific use cases in mind.
>
> 1) Proxy to workers
>
> In this scenario we have a proxy (or multiple proxies) accepting
> requests at the edge of our cluster. The proxy will ask the correct
> server for a response. Either through using the 'global' module or
> gproc or something similar, the proxy node will keep track of the
> mapping of user ids to process and nodes. The proxy will call the node
> using the Erlang distribution protocol.
>
> 2) Mesh
>
> In this scenario a request can be handled by any node in the cluster.
> If the request cannot be handled by the local node, the correct node
> is called on instead. Every node in the cluster needs to keep track of
> which id belongs to which process.
>
>
> The numbers:
> * We must be able to run 100,000 processes, each may peak at 300kb of memory
> * We expect 5000 requests coming in to our application per second at
> peak, generating the same number of messages
> * Out of those 5000 messages, 1000 has a content that may peak at 300kb
> * All the other messages are well under 1kb
>
> Our concerns:
>
> * In the mesh, will the traffic between the nodes become a problem?
> Lets say we have 4 nodes, if the requests are evenly divided between
> the nodes the probability of hitting the correct node is 25%. With 100
> nodes this is 1%. As we add nodes, there will be more chatter. May
> this chatter "drown-out" the important messages, like pid down, etc.
>
> * Will it be more efficient to use the proxy to message every node?
> Will the message then always go to the node directly, or may it go
> through some other node in the cluster?
>
> * For the mesh, we need to keep the 'global' or gproc state
> replicated. With 'global' we will run into the default limit of atoms.
> If we were to increase this limit, what kind of memory usage could we
> expect? Are there any other nasty side-effects that we should be aware
> of?
>
> * In the "Erlang/OTP in Action" book it is stated that you may have a
> "couple of dozen, but probably not hundreds of nodes." Our
> understanding is that this is because of traffic like "this pid just
> died", "this node is no longer available", etc.
>
> * If we were to eliminate messaging between nodes, how many nodes
> could we actually run in a fully connected cluster? Will a
> high-latency network like ec2 affect this number or just the latency
> of messages? What else may affect the size?
>
>
> Regards
> Knut
>
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:
>
>
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:
>



More information about the erlang-questions mailing list