[erlang-questions] Maintaining state between application failover

reaperman123456789@REDACTED reaperman123456789@REDACTED
Wed Aug 14 13:20:44 CEST 2013


Assuming a distributed application, how could state between application 
starts due to failover be maintained?

For illustration purposes, consider the following problem:
We want a kind of server that delivers unique numbers. Starting at 0, on 
each request this number is delivered and incremented.
For implementation, we use a gen_server process that keeps the current 
number in it's state. We put that process under a one-for-one supervisor, 
which serves as the top supervisor of the application.

Now, even in a non-distributed setup, the gen_server could not maintain the 
state between restarts managed by it's supervisor. We could store the 
current number in the environment of the application itself (which doesn't 
feel right, but for illustration purposes let's keep it in mind), where it 
would survive restarts of the gen_server process.
In a distributed setup, even the state stored in the application 
environment would not survive in case of an application failover. When the 
node on which the application is running dies and is restarted on another 
node, it starts at 0 again.

Performance considerations aside, the current number could be constantly 
kept and updated permanently in a state file which could then be read at 
startup. But since the nodes would usually be running on different 
machines, on failover the application would be restartet on another machine 
than the one where the file resided, and since the reason for the death of 
the erlang node is presumable the death of the hardware node, would not be 
accessible from the application started in failover mode. For keeping the 
state file accessible everywhere, we would need to put it on a NFS mount or 
something, but the NFS server would become a critical component in our 
setup, not to mention the overkill of running an extra machine for the 
single purpose of sharing a single file which would not exceed a few bytes 
in size. Using a database of whatever flavor is essentially the same.

So, how could state be efficiently maintained in the erlang way of doing 
things?

To clarify, I am *not* asking for a solution to the problem of generating 
unique numbers, there a probably a thousand ways to do this in a better 
way, UUID and whatnot. I am asking for ways to maintain state between 
restarts of a distributed application in a failover scenario. The example 
problem above is a special case of the general problem I am asking about, 
made up purely for the purpose of having a simple illustration. A solution 
of the general problem would automatically solve the special case, anyway ;)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130814/a63f5412/attachment.htm>


More information about the erlang-questions mailing list