[erlang-questions] Idea on stateful server load sharing & fail over

Evans, Matthew <>
Wed Apr 7 20:42:32 CEST 2010


Is there any reason why you couldn't have a "back end server" process per user? That would be the Erlang way of doing things. Even if there are millions of users/processes you should be fine.

The "back end server" process would save state in its state record (assuming it is a gen_server or gen_fsm), and when it is in a stable state it could checkpoint that information to ets, mnesia or a peer process.

You could have an ETS table that maps the call id string to the back end server pid. And use erlang:monitor to determine if the back end server dies. If it does, restart it or map the record in the ets table elsewhere.


-----Original Message-----
From:  [mailto:] On Behalf Of Kaiduan Xie
Sent: Wednesday, April 07, 2010 1:39 PM
To: erlang-questions
Subject: [erlang-questions] Idea on stateful server load sharing & fail over

Hi, all,

Consider the following case, a system consists of a farm of state-full
call servers, and needs to support massive number of users, and zero
down time or at least 5-9 availability. The number of servers are
dynamic, in other words, server can come up and go away. The server is
a state-full server, it stores information of the call, the state
should be able to survive server crash so that in-call feature can be
supported after crash. The system has two tiers architecture. At the
front, a dispatcher dispatches the incoming request to the back end
state-full servers, the dispatcher is stateless. So the following
questions comes,

1. How to dispatch the incoming request to back end servers? Please
note that the back-end server is a state-full server, for example, all
requests in a call should be dispatched to the same server. The call
is identified with a call id that is a random string. Also the back
end server can be added or removed dynamically.

2. How to replicate the call state information among back end servers?

3. How front end server detects a back end server is down? How back
end servers detects that one of their peer is down? Can we use
erlang's monitor in distribution to achieve that?

Thanks all for the pointers and thoughts,


erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:

More information about the erlang-questions mailing list